Protecting traffic privacy for massive aggregated traffic

Protecting traffic privacy for massive aggregated traffic

Computer Networks 77 (2015) 1–17 Contents lists available at ScienceDirect Computer Networks journal homepage: www.elsevier.com/locate/comnet Prote...

2MB Sizes 1 Downloads 89 Views

Computer Networks 77 (2015) 1–17

Contents lists available at ScienceDirect

Computer Networks journal homepage: www.elsevier.com/locate/comnet

Protecting traffic privacy for massive aggregated traffic Alfonso Iacovazzi ⇑, Andrea Baiocchi DIET, Sapienza University of Rome, Via Eudossiana 18, 00184 Rome, Italy

a r t i c l e

i n f o

Article history: Received 7 June 2014 Received in revised form 10 November 2014 Accepted 28 November 2014 Available online 4 December 2014 Keywords: Privacy Traffic masking Traffic flow classification Padding Fragmentation Internet traffic

a b s t r a c t Traffic analysis has definitely shown that encryption is not enough to protect the privacy of communications implemented over packet networks. The very features of packet traffic, like packet lengths statistics, inter-packet times, volumes of exchanged traffic, communication patterns, leak information. Leakage ranges from the kind of application that generates the information flow carried into the supposedly secure connection to parts of its content. We propose traffic masking as a countermeasure. Full confidentiality protection is discussed and the traffic masking framework is introduced and motivated. The optimization and performance assessment of the masking device is evaluated both through a general analytical model, mainly useful to gain basic insight, and by a real network emulation of a distributed secure multiparty computation application, where confidentiality requirements are key to the application itself. It is shown that essentially full confidentiality can be attained for a practical distributed security application by accepting an increase of the traffic volume by a factor 2.4 and an increase of the task completion time of 30%. Hence, (almost) full privacy appears to be more appealing for contexts where delay constraints are more valuable than bandwidth. Ó 2014 Elsevier B.V. All rights reserved.

1. Introduction The information content within isolated messages exchanged between two parties can be safeguarded by means of encryption, but it is much harder to protect sender and/or receiver confidentiality in a broader sense, when exchanging a structured flow of packetized information. Traffic analysis is an example of how to exploit sidechannel information leakage and infer as much knowledge as possible on the observed message flow. In [1] Raymond provides a comprehensive overview of possible attacks that can be carried out using traffic analysis. As also pointed out in [2], the main traffic analysis attacks exploit information regarding timing, communication patterns, ⇑ Corresponding author. E-mail addresses: [email protected] [email protected] (A. Baiocchi). http://dx.doi.org/10.1016/j.comnet.2014.11.019 1389-1286/Ó 2014 Elsevier B.V. All rights reserved.

(A.

Iacovazzi),

packet counting/volume, on-line/off-line periods in the communication. These types of attacks have been known for decades, and there are numerous studies that have shown their effectiveness in various networking scenarios. Overall, it appears that tools for packetized information flow confidentiality protection still fall short of what is required. Among the different confidentiality breaking attacks, Traffic Classification aims at identifying the type of application protocol in a given set of possibilities. Several approaches are used to achieve this goal, e.g., see the surveys and review works [3–8] and references therein. Other type of confidentiality breaking of supposedly end-to-end secure channels have been demonstrated via statistical analysis using side-channel features. We can find in literature several researches, e.g., [9–13], dealing with the issue of web pages fingerprinting. These works point out that downloaded web pages can be identified even if an encrypted and authenticated channel is used.

2

A. Iacovazzi, A. Baiocchi / Computer Networks 77 (2015) 1–17

Identifying sensitive information in healthcare and finance [14], recognizing the language of the conversation in encrypted VoIP transactions [15], getting partial transcripts of an encrypted VoIP conversation when the underlying audio is encoded using Variable Bit Rate codecs [16] and prediction of users contextual location [17] are further examples of how one can steal information by exploiting side-channel leakage due to packet lengths, inter-packet times, packet directions, communication patterns. In order to thwart traffic analysis, several research groups have explored techniques with the aim of hiding side-channel information. An early contribution is [18], where models and trade-offs for anonymity providing algorithms are discussed for application flows with low latency requirements. Adding a random amount of padding to each packet of a flow is the main countermeasure exploited in secure channel protocols such as SSH, TLS and IPSec, but it can be readily verified that this method does not really lead to an effective masking of revealing traffic features [19]. Convex optimization techniques are used in [20] in order to modify the original packet lengths distribution to look like a target distribution (morphing), with minimum overhead. In [21] the optimum padding for masking only packet lengths is found, while in [22] the same Authors explore minimum overhead traffic masking algorithms for several features and the trade-off between traffic confidentiality protection and masking cost. The algorithms developed in these works make a decision on fragmentation and padding based on a priori knowledge of statistics of the traffic that one wants to protect. This implies that the masking device must have sufficiently large and quickly accessible memory spaces. A line of literature has developed over anonymous communications issues, whose purpose is to guarantee (at some degree) the unlinkability property of packets and users, i.e., it must be not feasible to associate a packet captured in the network to a specific user and viceversa. Contributions in this context are [23–26]. In [23] intermediate relay nodes are used as well as cover traffic to anonymise user communications. Operations is carried out at IP level. Tunnels are created according to a uniform topology of relay nodes. Peer and route selection is done so as to guarantee that connections among peers are independent as for topology and carried traffic intensity of the actual user communication patterns. The DISSENT client–server architecture is proposed in [24], where client users are provided anonymity as long as there is at least one honest server in the set they select, yet they need not know which one is honest. The major drawback of the approach of [24] is scalability in the face of the sophisticated cryptographic tools and related required processing that are used in DISSENT. In [25,26] authors propose a peer-to-peer, low latency anonymous network that is resistant to traffic analysis to overcome limitations of previous architectures. Besides providing a useful classification and review of many previous contribution, those works propose the Aqua architecture, based on ‘circuits’ of relay nodes, able to provide sender and receiver anonymity. Again, key to Aqua architecture is traffic obfuscation obtained by means of cover traffic among the relay nodes.

Other works focus on privacy protection of specific applications. In order to achieve anonymity against webpages fingerprinting, Yu et al. [27,28] implemented a new strategy of packet padding aiming at offering perfect anonymity on web browsing. Their solution allows to reduce overhead by exploiting web pages that the user is predicted to download in the future as padding. Luo et al. [29] propose to modify statistical features of a traffic flow by using a set of transformations both at the application and at the transport layer (e.g., packet padding and/or fragmentation, HTTP Range, TCP Maximum Segment Size (MSS) negotiation, modifications of the value of TCP advertising window, HTTP pipelining, etc.). These techniques are designed only for web browsing flows and cannot be applied to the traffic generated by other applications. A traffic reshaping technique is used by Zhang et al. [30]. In their method an application flow is dynamically split in a set of new flows and then dispatched among multiple virtual MAC interfaces. Traffic features are reshaped on each virtual interface to hide those of the original traffic. In [19] authors give a thorough review of some practical confidentiality preserving techniques against traffic analysis highlighting their main weaknesses. Despite efforts made in recent years to hide side-channel information in security protocols, Dyer et al. show that it is still possible to classify traffic flows after masking, even though they have only looked at some of the techniques known in the literature. We address the protection of the confidentiality of traffic flows in a packet network against traffic analysis, by introducing traffic masking on the whole packet flow transferred between two given endpoints. As a matter of example, this could be either the flow belonging to an SSL/TLS connection or the aggregated traffic carried through an IPSec tunnel. Our proposed traffic masking is based on resampling packet sizes and inter-packet times from probability distributions that are independent of the original packet sizes and inter-packet times. We define an analytical model to gain insight into the performance trade-offs of the traffic shaper (masking device). Then, we apply our approach to a secure multiparty computation application, in a real network emulation testbed. We show that a performance benefit can be gained by trading off a limited amount of leakage against overhead and delay introduced by the masking device. Unlike other works, the approach explored in our study is particularly suitable for aggregated flows, in which we observe massive volumes of traffic exchanged, although it can be independently applied to individual traffic streams. Moreover, it minimizes the required processing, thus being viable also in cheap, low end networking equipment. Specifically, we claim the following main contributions:  we provide models to gain understanding into overhead-delay trade-off of tunnels that provide full traffic obfuscation;  we propose and optimize a dual buffer strategy to improve the overhead-delay trade-off;  we prove the effectiveness of our framework under a real network application traffic.

A. Iacovazzi, A. Baiocchi / Computer Networks 77 (2015) 1–17

3

ysis attacks on the packet flow between A and B. We refer to this kind of protection as traffic masking. Network

Network

2.1. Adversary model and security metric

1 The concept is the same in case more than one tunnel is established between N A and N B at a given time, by just referring to each tunnel separately.

2 The tagged tunnel between A and B can also be a part of a protected path through several subnetworks, made up of a cascade of protected tunnels.

Network The reference networking scenario is illustrated in Fig. 1. It comprises two subnetworks N A and N B which are connected by means of an insecure network N C . The traffic exchanged between the two subnetworks N A and N B is conveyed via a secure tunnel (encrypted and authenticated), established between two edge nodes A 2 N A and B 2 N B , e.g., IPsec tunnel, or SSL/TLC connection or SSH connection.1 We focus on countermeasures against traffic anal-

We assume a passive adversary, having full access to the network N C . The adversary can observe the traffic carried through N C , specifically the traffic exchanged between the two edge nodes A and B. The adversary has only access to traffic carried over N C . The edge subnetworks N A and N B are either trusted or protected against traffic analysis.2 As a matter of example, a secure VPN tunnel connecting two corporate private networks through the public Internet fits into this scenario. The adversary is able to perform traffic analysis on the packet flow exchanged between A and B, aiming at any of the target cited in Section 1. As a matter of example, the adversary can analyze the features of the observed traffic, such as packet lengths, inter-packet gap times, the volume of traffic exchanged over time. This data can be used to gather information about the activity of one or more hosts, e.g., to identify the application generating some portion of the traffic or to (at least partially) break the confidentiality of the exchanged information. We assume the edge nodes A and B are trusted and cannot be manipulated by the adversary, as typical of security function provided by means of a tunnel through an insecure network. Attacks on tunnel establishment (including key establishment), tunnel encryption or authentication are out of the scope of this work as well as DoS attacks on the edge nodes. All of these attacks can be contained by resorting to cryptographic primitives or known DoS contrasting means. Let us consider the two endpoints A and B, within the subnetworks N A and N B respectively, and a packet flow exchanged between them. For ease of language, the term packet is used to refer to data units of the traffic flow, even if they could belong to a layer different from network one. Let X k and T k be the lengths of the k-th packet and the time elapsing between the ðk  1Þ-th and the k-th packets (k-th inter-packet gap time) as measured at an edge node. We use a superscript ðAÞ or ðBÞ to distinguish quantities related to the traffic flow originated from the endpoint A or B, respectively. Since, we focus on the direction A ! B, we omit the superscript ðAÞ when referring to variables of the traffic coming from A, provided there can be no ambiguity. We assume both X’s and T’s can be modeled as drawn from wide-sense stationary processes, so that for any k we have X k  X and T k  T, with X and T being random variables with at least finite first two moments. We observe that essentially all traffic analysis attacks use in some way information derived from the sequences ðAÞ ðAÞ ðBÞ ðBÞ fX k gk2Z ; fT k gk2Z ; fX k gk2Z and fT k gk2Z as input, plus any prior knowledge that the adversary might have on subnetworks N A and N B . As a matter of example, classification of application protocols exploits packet lengths and

Fig. 1. Scheme of the reference networking scenario.

The third point is a key added value with respect to customary trace-driven simulations, where performance of traffic masking algorithms is tested by feeding open-loop packet traffic streams, possibly collected in operational networks. Since traffic masking algorithms act upon the traffic in transit and they introduce delay, there is a closed loop interaction between traffic masking the evolution of the application level state machine, hence the traffic offered to the masking algorithm is modified by the insertion of the masking algorithm itself. We account for this cross-layer interaction by developing a full-blown application and networking test environment. Our results can be useful to design secure VPNs with augmented confidentiality levels. The traffic masked tunnel we define is also a key building block of anonymity networks, as pointed out, e.g., in [23,25]. We provide an approach to realize those traffic obfuscated channels and to improve their overhead-delay trade-off. In the rest of this paper we state the traffic confidentiality model in Section 2. Section 3 is devoted to an analytical model to gain insight into the design of the traffic masking device. A case study is analyzed in Section 4, by focusing on a real application, a secure multiparty computation one, in a real network emulation testbed. Traffic masking performance are discussed for the single and double buffer masking algorithms in Section 5. Finally, conclusions are drawn in Section 6. 2. Protection against traffic analysis with traffic masking

4

A. Iacovazzi, A. Baiocchi / Computer Networks 77 (2015) 1–17

time gaps of the packets making up each application session. This motivates the formal definition of our security objective: it is a measure of how much the adversary can learn about the packet lengths and gap times, given the observed information leaked by the packet flow running between A and B. ðAÞ ðAÞ Let UðAÞ  fX k ; T k g16k6m be the set of packet lengths A and gap times that refer to the packets carried from A to B through the tunnel in a given observation time interval; ðBÞ

ðBÞ

let UðBÞ  fX k ; T k g16k6mB be the analogous set of random

Endpoint A

X3

X2

network

X1

T2

Endpoint B

Traffic masking device

Y1

Y3

Y2

Y4

U2

Fig. 2. Sketch of the end-to-end connection with traffic masking: definition of in and out inter-packet gap times and packet length.

variables for the B to A direction in the same observation time. Let also U ¼ fUðAÞ ; UðBÞ g. The endpoint A and B transform the traffic to be carried into the tunnel by means of a traffic obfuscation function. The metric related to the traffic analysis attack can be defined after [31] by using the conditional uncertainty on U given: (i) side information W on U, known to the adversary; (ii) leakage information X on U, gained by the adversary by means of the observed traffic. The degree of traffic masking is measured by the normalized equivocation E:



HðUjW; XÞ HðUjWÞ

ð1Þ

where HðZÞ is the average entropy of the random variable Z. Perfect traffic obfuscation, given the context, corresponds to E ¼ 1. In that case, the observed leakage X does not yield any additional information about U than that provided by the prior probability distribution of U assumed by the adversary, given what the adversary already knows, i.e., W. 2.2. Traffic masking A general scheme of the masking device at either edge node is given in Fig. 2. The A–B connection depicted in this figure could represent a tunnel between IPSec security gateways or between two application endpoints of an SSL/TLS connection or between two mixes of an anonymity network, according to the architectures presented in, e.g., [23,25]. Let Y k and U k be sequences of packet lengths and interpacket time intervals, chosen for the masked traffic flow. The output packet departure times are denoted as td;k and t d;k ¼ td;k1 þ U k . The masking device shapes the input original traffic flow so as to impose that the output flow kth packet has payload length Y k and it is sent at time td;k . The logical block structure of the packet masking device is shown in Fig. 3. First, the input packet with length X is broken into N new packets of lengths Y 1 ; . . . ; Y N , whose lengths are drawn from the output packet length probability distribution function f Y ðÞ. The number of output fragments, N is the least integer satisfying both Y 1 þ    þ Y N1 < X and Y 1 þ    þ Y N P X. The bytes of the input packet are carried by the newly generated fragments. A header of length H is added to each fragment to form the output packet. The header carries the information required to reassemble the original packet at the other end of the secure channel. Those new packets are enqueued in a FIFO buffer. At output departure time td;k , a packet is taken from the buffer and sent to the output. If the buffer is empty, a

Dummy packet buffer in

out

Packet length masking Data buffer

Fig. 3. Masking device with packet length and gap times masking.

dummy packet is sent to the output, again with length taken from the probability density function (PDF) f Y ðÞ. Overhead sources are fragmenting, padding and dummy packets. The masking device can be optimized by choosing the PDFs f Y ðÞ and f U ðÞ so as to minimize the expected delay through the device for a given fraction of overhead in the output packet flow. In [32] it is proved the following result: given that original packet lengths and gap times are replaced by random variables independent of the original ones, the optimal masking device consists in a traffic shaper that outputs fixed size packets at fixed times. The masking device can be practically implemented at different architectural levels, ranging from packet level to application level, depending on the scope and the span of the protection. Reduction of packet lengths to the desired value can be obtained by means of fragmentation and padding, that can be readily done, e.g., fragmentation and reassembly is natively provided by IP. Continuous emission at fixed times can be realized, e.g., by using a leaky bucket logic, as already implemented in Linux OS, and a dummy packet generator. The redundancy introduced with dummy packets can be exploited to carry out a network coding of the covered packet flow. In case of congestion, network routers can discard a fraction of the packets of the masked flow, without hurting the integrity of the information flow (but possibly impacting on the reconstruction delay of the network decoder). 3. The confidentiality-efficiency trade-off of the masking device In this section we introduce a model of the masking device simple enough to lend itself to a thorough analysis, so as to understand: (i) the overhead-delay trade-off; (ii) how that trade-off can be improved by leaking some controlled information on the input traffic flow. The values of the masking device parameters are found by minimizing the mean delay through the device D under the constraint that the average overhead fraction g of the output link be given.

5

A. Iacovazzi, A. Baiocchi / Computer Networks 77 (2015) 1–17

Let us consider a masking device with output link capacity C, receiving an input packet flow described by inter-arrival times T k and packet lengths X k . We also let k  1=E½T. 3.1. Single buffer traffic masking device

kE½XU 0 H þ Y0

ð2Þ

Given the fixed payload length and fixed inter-packet gap time of output packets, the single buffer masking device can be modeled as a time slotted, batch arrival, deterministic service time queue. If we assume that input packets arrive according to a Poisson process of mean rate k and that input packet lengths are i.i.d. random variables X k  X, the resulting queueing model is a slotted time M ½x =D=1 queue, with  service time U 0 ;  batch size defined by the random variable S ¼ dX=Y 0 e;  output time axis slotted with slot duration U 0 . For a clear exposition of the model we refer to output fixed length packets as fragments, since they spring off input packets by means of fragmentation and padding processes. The output capacity C can be expressed as



H þ Y 0 kE½X ¼ 1g U0

ð3Þ

For the statistical equilibrium of the queue to exist it is necessary and sufficient that kE½SU 0 < 1. Given that an input packet arrives after a time H ¼ x of the current output slot start time, its sojourn time in the masking device, DðxÞ, conditional on the event fH ¼ xg, is given by

DðxÞ ¼ U 0  x þ AðxÞU 0 þ NU 0 þ SU 0

ð4Þ

where AðxÞ is the number of fragment arrivals during x; N is the number of fragments found in the masking queue at the beginning of a slot, and S is the number of fragments generated by the arriving input packet. If MðxÞ denotes the number of input packet arrivals in a time x, we have

AðxÞ ¼

MðxÞ X Sj

ð5Þ

j¼1 k

and PðMðxÞ ¼ kÞ ¼ ðkxÞ ekx , for k P 0. Hence, we get E½AðxÞ k! ¼ E½MðxÞE½S ¼ kxE½S and

E½DðxÞjH ¼ x ¼ U 0  x þ kxE½S þ E½NU 0 þ E½SU 0

ð6Þ

By removing the conditioning, since H is uniformly distributed over ½0; U 0 , we obtain

E½D ¼

U0 1 2 þ kU E½S þ E½NU 0 þ E½SU 0 2 2 0

E½N ¼

  2 kU 0 E½S2   E½S þ ðkU 0 E½SÞ 2ð1  kU 0 E½SÞ

ð8Þ

The final result is

Let us first consider a single buffer masking device according to the block scheme outlined in Fig. 3. The masking device shapes the traffic by emitting packets with payload length Y 0 , header length H, and fixed inter-packet gap time U 0 , so that C ¼ ðH þ Y 0 Þ=U 0 . The average overhead fraction is

g¼1

The expectation E½N can be found from the well known analysis of the batch arrival M=G=1 queue (e.g., see [33]):

ð7Þ

E½D ¼ U 0

  kU 0 E½S2  1 þ U 0 E½S þ 2ð1  kU 0 E½SÞ 2

ð9Þ

Eq. (9) yields the mean delay through the masking device E½D as a function of U 0 and Y 0 . In the numerical examples, we normalize the expected delay through the masking device with the delay through a buffer served with the same output capacity C, but no masking. The corresponding model is but an M/D/1 queue, with utilization coefficient q ¼ kE½X=C ¼ 1  g, so that the mean delay E½Dnomask  through this reference queue is

" # 1 ð1  gÞ2 E½X 2  E½Dnomask  ¼ 1gþ k g 2E½X2

ð10Þ

The normalized average delay is



  ðkU 0 Þ2 E½S2  þ kU 0 E½S þ 12 E½D 2ð1kU 0 E½SÞ ¼ 2 2 E½Dnomask  1  g þ ð1gÞ E½X  g

ð11Þ

2E½X2

From the constraint on the overhead fraction given in Eq. (3), we obtain the value of kU 0 ¼ ð1  gÞðH þ Y 0 Þ=E½X. By substituting this into the expression of D in Eq. (11), we obtain a function of the parameter Y 0 only for any given value of the overhead fraction g and of the input packet flow statistics. This can be minimized under the constraint that the queue be stable, i.e., ð1  gÞðH þ Y 0 ÞE½S=E½X < 1. To obtain numerical results, we have estimated the packet length pdf from sample traces taken from CAIDA repository (www.caida.org). Fig. 4 shows the Probability Distribution Function (PDF) and the Complementary Cumulative Distribution Function of the random variable X obtained from the sample. The average and the standard deviation of X are E½X ¼ 550 bytes and rX ¼ 638 bytes, respectively. The shown distributions exhibit a pronounced concentration of the probability mass around small packet sizes (from few tens up to barely one hundred bytes) and large packet sizes, around the typical MTU of 1500 bytes. This is a common feature of packet size distribution in real networks. We set the output packet overhead to H ¼ 20 bytes. This is consistent with IP packet fragmenting, that could be used to implement the masking device at IP layer. The left plot of Fig. 5 shows the average normalized delay through the masking device as a function of Y 0 for three values of g. It is apparent that an optimal value of Y 0 exists, even if the uneven input packet length pdf and the fragmentation/padding functions introduced by the masking make the graph of E½D bumpy. The optimal value of Y 0 as a function of g is shown in the right plot of Fig. 5. It is apparent that few different values of the output packet length are good choices for the entire range of feasible values of g. All those values fall in a range between 93 and 148 bytes.

6

A. Iacovazzi, A. Baiocchi / Computer Networks 77 (2015) 1–17

Complementary cumulative distribution function

0

Probability distribution

10

−1

10

−2

10

−3

10

−4

10

−5

10

0

500

1000

1

0.8

0.6

0.4

0.2

0

1500

0

500

Input packet length [bytes]

1000

1500

Input packet length [bytes]

Fig. 4. Probability distribution function (left plot) and Complementary Cumulative Distribution Function (right plot) of the input packet length random variable X.

150

6 5 4 3 2

η=0.3 η=0.4 η=0.5

1 0 0

200

400

600

Optimal output packet payload length, Y0 [bytes]

Normalized average delay

7

140 130 120 110 100

Single buffer 90

0

0.2

0.4

0.6

0.8

1

Average overhead fraction, η

Output packet payload length, Y0 [bytes]

Fig. 5. Average normalized delay of the masking device vs. output packet payload size Y 0 (left plot) and optimal value of Y 0 as a function of g (right plot).

3.2. Double buffer traffic masking device Motivated by the bimodal typical probability distribution of packet lengths, we investigate the improvement of the delay-overhead trade-off achievable if we allow the output packet flow to take different lengths, instead of a single one, namely Y 0 . The masking device block diagram is shown in Fig. 6. The input packet flow is split according to the packet length and sent to one of two buffers. The buffer output lines have capacity C 1 and C 2 , with C 1 þ C 2 ¼ C ¼ kE½X=ð1  gÞ; we let C 1 ¼ aC and C 2 ¼ ð1  aÞC, with a 2 ð0; 1Þ. Packets with X 6 h are sent to the small packet buffer, whose output is a slotted channel with fixed slot size U 1 , fixed output length Y 1 and capacity

kE½X H þ Y 1 C1 ¼ a ¼ 1g U1

ð12Þ

Packets with X > h are sent to the large packet buffer, whose output is a slotted channel with fixed slot size U 2 , fixed output length Y 2 and capacity

Dummy packet buffer X> in

X

Packet length masking Packet length masking

out Big packet buffer Small packet buffer

Fig. 6. Double buffer masking device scheme.

C 2 ¼ ð1  aÞ

kE½X H þ Y 2 ¼ 1g U2

ð13Þ

Let p1 ¼ PðX 6 hÞ and p2 ¼ 1  p1 . The average delay through this double buffer masking device is:

E½D ¼ p1 E½D1 ðh; a; Y 1 Þ þ p2 E½D2 ðh; a; Y 2 Þ where we derive from Eq. (9):

ð14Þ

7

A. Iacovazzi, A. Baiocchi / Computer Networks 77 (2015) 1–17

5

j ¼ 1; 2

Single buffer Double buffer

ð15Þ r

E½Sr1 

r

h; E½Sr2 

with ¼ E½dX=Y 1 e jX 6 ¼ E½dX=Y 2 e jX > h, for r ¼ 1; 2. In Eq. (14) we have emphasized that the delays through the small and large packet queues depend on the input length threshold h, the capacity split coefficient a and on the choices of the fixed output payload lengths Y 1 and Y 2 ; U 1 and U 2 can be expressed as functions of Y 1 and Y 2 respectively, by using Eqs. (12) and (13). The optimal average delay is:  E½D  ¼ min p1 minfE½D1 ðh; a; Y 1 Þg þ p2 minfE½D2 ðh; a; Y 2 Þg h;a

Y1

Y2

ð16Þ The optimization is carried out under the constraint that the two queues are stable, i.e., kp1 ðH þ Y 1 ÞE½S1  < C 1 and kp2 ðH þ Y 2 ÞE½S2  < C 2 . The normalized delay-overhead trade-off of the single and double buffer masking devices is compared in Fig. 7. Thanks to the bimodal probability distribution of the input packet lengths, it turns out that the double buffer masking device yields a significantly more favorable trade-off, allowing lower overhead fraction for the same average delay. It appears that the most convenient range of g levels is between 0.2 and 0.5, where the double buffer masking device outperforms the single buffer one. For g levels beyond about 0.6 the optimization of the double buffer masking device leads to design a single buffer device actually, i.e., the threshold and capacity split are set so that all traffic is diverted to the large packet buffer.3 The graphs in Fig. 8 show the optimal values of h (upper left plot), a (upper right plot), and of the output payload packet lengths Y 1 ; Y 2 , as a function of g (bottom plot). It is apparent that above about g ¼ 0:6 the optimization points out that a single buffer is best. As a matter of fact, splitting the output capacity between two buffers incurs multiplexing inefficiency, but this is more than compensated by the reduction of overhead implied by a tailored choice of the output packet payload lengths for small and large packets. When a large overhead is allowed the gain coming from the exploitation of the bimodal nature of the input packet PDF is outweighted by the multiplexing inefficiency (i.e., one queue can be idle while the other one is busy).

Normalized average delay

  kpj U j E½S2j  1  þ U j E½Sj  þ ; E½Dj  ¼ U j  2 2 1  kpj U j E½Sj 

4

3

2

1

0

0

0.2

0.4

0.6

0.8

1

Average overhead fraction, η Fig. 7. Trade-off between normalized average delay and output average overhead fraction for the single and double buffer masking devices.

buffer masking device and Y 1 ; Y 2 ; U 1 and U 2 for the double buffer masking device), some information on the original traffic flow leaks, namely, no more than the information on the packet lengths and gap times statistics that are used in the optimization procedure. The full information used in the optimization procedure consist of the average packet arrival rate k, the packet length PDF and the target level of overhead g. The adversary can set up equations involving those quantities by equating the known tunnel fixed rate with the average input information rate increased according to the overhead level, namely

kpj E½X j  H þ Yj ¼ Cj ¼ ; Uj 1g

j ¼ 1; 2

ð17Þ

with E½X 1  ¼ E½XjX 6 h and E½X 2  ¼ E½XjX > h. A similar expression is obtained in case of a single buffer. Other equations can be derived by setting to zero he derivatives of the target optimization function with respect to the optimization variables, i.e., Y 0 in case of the single buffer device, and a and h in case of the double buffer device. The extent of the leakage due to the optimized parameters is investigated in the next subsections. We denote the observed traffic features as X1  fY 0 ; U 0 g and X1  fY 1 ; U 1 ; Y 2 ; U 2 g for the single and double buffer masking device respectively. We distinguish two different assumptions on the a priori knowledge of the adversary, namely no prior knowledge (W ¼ ;) and full prior knowledge (W ¼ fPDFðXÞ; k; gg).

3.3. Confidentiality analysis Since the obfuscated traffic flow is made of fixed length packets emitted at fixed times continuously in both directions, no information on individual packet lengths or gap times can be gained by the adversary on the basis of the observed traffic features X. Due to the optimization carried out in the selection of the masked traffic parameters (Y 0 and U 0 for the single 3 Due to the equality sign in the split test, the least size input packets are still sent to the small packet buffer.

3.3.1. Adversary with no a priori knowledge We calculate the leakage metric E with reference to the packet length PDF only, i.e., U ¼ X. The PDF of the packet lengths X, with no a priori information, corresponds to the maximum entropy HðUÞ ¼ log2 nX , where nX is the number of possible packet length values. The equations that can be derived by the adversary once she observes the covered traffic optimized parameters can be simplified by approximating the moments E½dX r =Y 0 e with E½X r =Y r0 . Even then, in case of single buffer masking device, we get two equations involving four unknowns,

A. Iacovazzi, A. Baiocchi / Computer Networks 77 (2015) 1–17

1000

0.2

Optimal output capacity fraction, α

Optimal input packet length threshold, θ [bytes]

8

800

600

400

200

0

0

0.2

0.4

0.6

0.8

0.15

0.1

0.05

0

1

0

0.2

Optimal output packet payload length [bytes]

0.4

0.6

0.8

1

Average overhead fraction, η

Average overhead fraction, η 4

10

Large packets Small Packets 3

10

2

10

1

10

0

0.2

0.4

0.6

0.8

1

Average overhead fraction, η Fig. 8. Optimal values of: (i) the input packet length threshold h (upper left plot); (ii) the small packet buffer output capacity fraction a (upper right plot); and (iii) the output payload packet lengths Y 1 and Y 2 (bottom plot) as a function of g.

i.e., g; k and the first two moments of X. In case of the double buffer masking device we get four equations in the unknowns g; k; pth  PðX 6 hÞ,4 and the first two moments of the random variables X 1 ¼ XjX 6 h and X 2 ¼ XjX > h. In the following we upper bound the leakage by assuming that the adversary can anyway obtain information on the PDF of X, specifically up to the first two moments and the threshold probability pth . As a matter of example, under the condition that E½X and E½X 2  are known, i.e., X  fE½X; E½X 2 g, the PDF of the packet length X can be found according to the maximum entropy principle as 2 pk ¼ PðX ¼ kÞ ¼ 2ab1 kb2 k ; k ¼ 1; . . . ; nX . The constants a; b1 and b2 are found by the normalization condition and by imposing the known values of the first two moments. The resulting entropy is HðUjXÞ ¼ a þ b1 E½X þ b2 E½X 2 . Similarly, we can deal with other known parameters. Fig. 9 plots E as a function of g for W ¼ ; and various assumptions on X. It is apparent that in case the adversary can only estimate the mean values of the packet length

4 As a matter of example, for 0:2 6 g 6 0:6 the optimal h is always close to about 200 bytes. Then, pth can be estimated by observing the ratio C 1 =ðC 1 þ C 2 Þ, the adversary gets a hint on the fraction of input packets that have lengths no larger than h .

then the realized leakage is minimum, especially in the range of practical values of g. The assessment of E must account the fact that getting full knowledge here (i.e., achieving E ¼ 0) means that the adversary gets to know exactly the packet length PDF thanks to the leakage of the protected tunnel, given that the adversary has no a priori knowledge. Even this extreme result is just what we assume as the a priori knowledge of the adversary in the next subsection. Even knowing the PDF of packet lengths the adversary cannot tell anything of the actual activity of the users communicating through the protected tunnel at any given time, nor can she tell whether there is activity at all. 3.3.2. Adversary with full knowledge of the offered traffic parameters We assume here that the adversary knows the set of applications that users in networks A and B can use and knows statistics of the features of the traffic produced by those applications (e.g., volumes of traffic of each session, sequence of lengths of packets of each session, etc.). As a matter of example, if the adversary knows that most of the traffic has to do with web surfing and access to file repositories through web interfaces, she can infer that

Normalized conditional entropy, E

A. Iacovazzi, A. Baiocchi / Computer Networks 77 (2015) 1–17

1 0.983 0.95

0.85 0.808 0.75

E[X]

0.65

E[X], E[X ] E[X1], E[X2] P(X<=θ) P(X<=θ), E[X] P(X<=θ), E[X], E[X2]

2

0.55 0.1

0.2

0.3

0.4

0.5

0.6

Average overhead fraction, η Fig. 9. Normalized conditional entropy E of the packet length PDF estimated by an adversary with no a priori knowledge on the packet length PDF as a function of g.

the application protocol used mostly is http and she could even get samples of that kind of traffic from the open Internet, so as to learn what its statistical features are. Formally we are assuming here that X ¼ fPDFðXÞ; k; gg. Even if endowed with such a priori knowledge, the adversary cannot learn anything more on the actual user activity in a given period of observation during which the adversary captures the traffic carried on the protected tunnel. Specifically, the adversary could not tell whether any given captured packet is actually carrying some user information on only dummy bits. Any hypothesis the adversary could make is based completely on her prior knowledge and no further evidence is gained by observing the constant rate traffic carried continuously on the tunnel in both directions. Formally, we have E ¼ 1 in this case. What is being fully protected against such a knowledgeable adversary is the actual activity of the users at any given time. The (a priori) knowledge of the adversary has only to do with global, long-term characteristics of the traffic (e.g., packet length PDF, type of application used and possibly their relative frequency). The adversary cannot tell however, which of the users in networks A and B are actually communication at any given time, if there is any communication at all and what it consists of (e.g., what application is being used at any given time). In the ensuing sections we move from a theoretical model, mainly aimed at basic understanding of the masking device parameter inter-play, to a realistic example, based on network protocol emulation of a full-blown, real application.

4. Framework for masking performance assessment In order to assess the impact of masking techniques, both in terms of packet delay and in terms of overhead, we have set up an emulation based test-bed where masking is applied to a link between the subnetworks N A and N B . To this end, we have used Common Open Research Emulator (CORE) to emulate the behavior of the computer

9

applications and the traffic they generate in the presence or absence of masking techniques. CORE has been chosen since it offers a lightweight representation of computer networks for emulation. It also allows us to easily run real applications that need to communicate on the network. CORE in each host/router is emulated through a virtualization of the Operating System. The virtualization allows replicating only the network stack and the functions strictly necessary for emulation, thus avoiding replication of the entire OS image with considerable saving of memory space. Even though the network is emulated, the exchange of messages between two nodes in the network is a real exchange in the sense that actual protocol stacks are involved as in an operational network with physical nodes. These features make CORE particularly attractive for emulating large scale networks on commodity hardware. For further details about CORE features we refer the reader to [34]. Hosts in the two subnets N A and N B run the application used to test the performance of the masking device. In our experiment setting the masking device is realized on the edge routers interconnecting the subnet N A (N B ) via the intermediate insecure subnetwork N C . All the traffic exchanged between hosts located in the subnets N A and N B is carried through a tunnel between the edge routers and traffic masking is applied to the aggregated traffic of the tunnel. Traffic masking could be realized also at the involved end hosts. Practical implementation depends on the desired scope of the privacy coverage. 4.1. Trade-off between information leakage and performance of traffic masking The masking device consists of a single buffer whose output is shaped to have fixed length packets (packet payload equal to Y 0 ) sent at fixed times (inter-packet gap time equal to U 0 ). Typical network traffic exhibits quite concentrated packet length probability distribution functions, with a few recurrent lengths and the majority of lengths values having very small probability. Most of the packets carried by operational networks are either relatively short (TCP ACKs, DNS messages, signaling messages of applications in general) or consistently long (with typical length close to the Ethernet MTU = 1500 bytes). Having a fixed payload length Y 0 at the output of the masking device incurs in a heavy overhead. Overhead can be reduced by allowing different fixed length values Y j ; j ¼ 1; . . . ; ‘, as shown in Section 3 for ‘ ¼ 2. The masking device is configured as ‘ FIFO buffers, served by ‘ channels. The j-th channel sends packets with fixed payload Y j and fixed interpacket gap times U j . Incoming packet belonging to the traffic flow that is being masked are split among the ‘ buffers according to their length X. The decision thresholds of the traffic splitting can be optimized to minimize the mean delay through the masking device for a given overhead fraction. A multiple buffer masking device as the one here described deviates from the perfect masking device described in Section 2. In fact, information on the original packet length distribution leaks through the masking device, specifically the fraction of the input traffic corre-

10

A. Iacovazzi, A. Baiocchi / Computer Networks 77 (2015) 1–17

sponding to each packet length interval can be grossly estimated by looking at the rate of the masking device output channels. The trade-off between leakage and performance can be controlled by choosing ‘. No leakage is allowed for ‘ ¼ 1. The most detailed knowledge of the input packet lengths is leaked if ‘ is set equal to the number of different lengths values of the input traffic flows (at most they are as much as the MTU value, but usually much less, e.g., ten different length values can cover more than 90% of the probability mass of the packet length PDF). In the following we address the case ‘ ¼ 2. Out main point here is to assess the benefit in terms of performance brought about by a minimal leakage on the input packet length information. Essentially the leakage amounts to yielding a gross estimate of the ratio between ‘‘small’’ packets and ‘‘large’’ packets in the input traffic flow. With ‘ ¼ 2 a single threshold h must be chosen. Packets with lengths X < h are directed to the ‘‘small’’ packet buffer, the others to the ‘‘large’’ packet buffer. Output packet payload lengths and inter-packet gap times are fixed at Y 1 ; U 1 for the ‘‘small’’ packet output channel and Y 2 ; U 2 for the ‘‘large’’ packet output channel.

4.2. Emulation testbed setting To obtain a faithful yet simple emulation of the masking device we have considered the layout shown in Fig. 10 for the single buffer masking device and the one in Fig. 11 for the double buffer masking device.

2

1

Network

1

Network

Network 2

Fig. 10. Network topology of the framework emulating the single buffer masking device.

3

1 1

3

Network

Network

Network

1

2 2

2

4

4

Fig. 11. Network topology of the framework emulating the dual buffer masking device.

By observing the topology emulating the single buffer masking device, we can identify three types of nodes:  the application hosts belonging to the subnets N A and N B , which run an application generating traffic to one or more hosts of the same type, placed in the same or in the other subnet;  the dummy host Hdumi set to generate dummy packet flows which will be mixed with the real traffic;  the two routers Renci , specialized in multiplexing/ demultiplexing useful and dummy traffic, to create a single stream of IP packets of the same length L0 ¼ H þ Y 0 (by using fragmentation and padding) and to send the fixed length packet stream with a fixed inter-packet time U 0 . In this topology, all traffic flowing between the two subnets N A and N B passes through an encrypted tunnel between the two routers Renc1 and Renc2 . In the topology with dual buffer we find the following nodes:  the application hosts;  two router Rspliti that connect the subnets N A and N B with the insecure network N C and carry all the traffic going from N A to N B and vice versa;  four dummy host Hdumi ;  four router Renci acting as endpoints of two encrypted tunnels. Routers Renci are connected to subnet N C via Ethernet interfaces. Renc1 and Renc3 send out packets of fixed length Lsmall ¼ H þ Y 1 , while Renc2 and Renc4 set the fixed length of the packets they send out to Llarge ¼ H þ Y 2 . Routers Rsplit1 and Rsplit2 have the task of routing traffic from one subnet N A or N B to the insecure network N C (and vice versa). The routing tables of these two routers are configured so as to obtain a routing based on the length of the incoming packet. The routing algorithm is very simple. Let us consider the router Rsplit1 : after fixing a threshold h, all packets arriving from the network N A , undergo a check on the packet length, and if the length is smaller than h, then the packet will be forwarded to the router Renc1 , while if it is greater than or equal to h, it will be forwarded to the router Renc2 . Dummy hosts Hdumi for i ¼ 1; . . . ; 4, have been designed so as to generate a continuous stream of dummy packets. Hosts Hdum1 and Hdum3 send packet of fixed length Lsmall while hosts Hdum2 and Hdum4 set their packet lengths to Llarge . For both topologies, dummy packets are generated by the nodes Hdum at a rate greater than what the connection between edge routers of the subnets N A and N B can work off. The dummy packets are inserted inside the stream of useful traffic in order to fill the vacant time intervals in which there is no useful packet to send. The generation of dummy packets is handled by the open source software Scapy 2.2.0 [35]. The dummy packets are addressed to the dummy host opposed to the one that generates them. As a matter of example, the host Hdum1 sends its packets to the host Hdum3 and vice versa. When dummy hosts receive the pack-

A. Iacovazzi, A. Baiocchi / Computer Networks 77 (2015) 1–17

ets, they simply discard them. All dummy packets are processed by the routers through static routing, so as to ensure that they fill the link capacity for which they were generated. Routers Renci for i ¼ 1; . . . ; 4, aim at creating a continuous stream of packets, all with the same length and equally spaced in time. Let us consider the router Renc1 : it receives as input all packets coming from Rsplit1 with length smaller than h and the continuous flow of dummy packets from the host Hdum1 . These packets will be rerouted towards the eth0 interface in order to reach Renc3 . A double buffer is in correspondence to the eth0 interface, used to assign a different priority to useful and dummy packets. Packets from Rsplit1 will be conveyed to the high priority buffer, while the dummy packets will be routed to the low priority one. When the router needs to send a packet on eth0, it extracts the first packet available in the queue with higher priority. If no packet is present in the high priority queue, then the dummy packet buffer, that is always non-empty, will be selected. Packets sent out on eth0, will undergo fragmentation and/or padding to ensure that outgoing packets are all with the same fixed length Lsmall , and then they will be encapsulated in an encrypted channel previously established with the router Renc3 .5 In addition, the router receives packets from Renc3 through the encrypted channel. They are decrypted, padding is removed and then the packets are routed according to their destination IP address. The same work is performed by all other routers; the only difference is that the router in the bottom link of the figure do not require fragmentation, since the protected tunnel fixed length Llarge is set equal to the eth1 interface MTU, i.e., Llarge ¼ 1500 bytes. This is motivated by the IP packet length probability distribution of the considered application (see Fig. 4): most of the mass is concentrated on relatively small and large packet sizes. This is typical of IP traffic.

5. Case study: masking applied to a secure multiparty computation system The masking technique with single or double buffer, based on dummying, fragmentation and padding, is especially meant to be applied to aggregated traffic carrying a significant amount of data over extended span of time. While the model based performance evaluation of Section 3 focuses on packet level metrics, in this section we use the network emulation testbed described in Section 4 to assess performance as seen from the application point of view. The delay is therefore the time taken to complete an application significant task. Delay and overhead are evaluated both with and without the traffic masking. We remark that our experiments with a full blown application and networking environment brings about a realistic evaluation of the user perceived performance. The delay introduced by the masking device affects the application dynamics, so that the traffic impacting the masking device is dependent of the masking device itself 5 In our experiment, the SSH protocol has been used in order to establish a secure tunnel, but any other type of secure channel would be possible, e.g., IPSec tunnel.

11

and its chosen parameters. This closed-loop interaction between the application state machine and the masking device cannot be accounted for in trace-driven simulations, where open loop input traffic is offered to the device. Even if traces are collected in real networks, they represent a sample of packet traffic that does not account for the effect of the introduction of the masking device into the data path. To gain a full understanding of real operation, the experiments must be done by running both the applications generating the traffic to be protected and the masking devices simultaneously. In such a context, an application distributed over multiple hosts, that require a continuous cooperation with a sustained transfer of data among them, is well suited. We have chosen a distributed application of secure multiparty computation (SMC), in which several parties are enabled to jointly compute a function over their inputs, while at the same time keeping these inputs private. Given the security requirements of this kind of applications, it makes sense to enhance the secure channels among the hosts with traffic confidentiality protection through masking. We used SEPIA, a java implementation for generic secure multiparty computation developed by the ETH Zurich [36]. SEPIA offers basic primitives, optimized for processing high-volume input data. It is based on the Shamir’s secret sharing scheme and is secure in the honest-but-curious adversary model. Compared to other tools of SMC, SEPIA allows grouping operations in rounds and this allows to get faster execution times for a single operation. For further information about SEPIA and his implementation please refer to [37]. The SMC system has been integrated into our masking mechanism by associating the application players to the hosts of the two subnetworks N A and N B of Fig. 11. In the next subsections we detail the considered application scenarios and how the various players of the distributed platform are divided between subnets N A and N B . Then we present the results of the performance tests obtained by emulating the SMC application, according to the networking scenario described in Section 4. 5.1. The SEPIA SMC architecture Input Peers (IPs) and Privacy Peers (PPs) are the two kind of entities defined in SEPIA. The IPs own private data which will be the input for the computation, while PPs perform the secure and distributed computation on the data. Even if PPs are logically different from IPs, a host can take the role of both PP and IP, without impairing the security level provided by the SMC system. However, in our testbed, IPs and PPs are always considered as separate entities; so a separate virtual machine is instantiated for each IP and each PP. Let N and M denote the number of IPs and PPs, respectively. Based on the Shamir’s secret sharing protocol implemented in SEPIA, the i-th IP generates a random polynomial f ðÞ and computes M shares for its private input ai ; i ¼ 1; . . . ; N. The i-th IP sends the j-th share to the j-th PP, j ¼ 1; . . . ; M. The PPs collaborate in order to compute the function on the shares received and the final result will be sent back to all IPs. We do not dwell on the details on

12

A. Iacovazzi, A. Baiocchi / Computer Networks 77 (2015) 1–17

Network

Network /2+ 1

2

Network

−1

Network

1 1

1 2

1

Network

2

Network

/2

/2+ 1

/2

−1

2

(a) Topology 1

Input Peers

Privacy Peers

(b) Topology 2

Fig. 12. The two topology mappings considered in SEPIA performance evaluation experiments.

how the shares are calculated and how the PPs operate to compute the final result. We just need to know that, since Shamir’s secret sharing algorithm is linear, in order to calculate the sum of M inputs, it takes only one round of communication among the PPs, while the multiplication requires two rounds of communication. In order to embed the SMC system just described in the network topology of Fig. 11, we considered two mappings. In the first one (Fig. 12(a)) the IPs and the PPs are equally divided between the two subnets N A and N B ; in the second topology mapping (Fig. 12(b)), we place all IPs in subnet N A while PPs are placed in N B . In our experiments we have considered N ¼ 8 IPs and M ¼ 6 PPs. Among the several operations of secure multiparty computation implemented in SEPIA, we selected the basic addition on shared secrets which requires the least computational cost, while in order to get a significant volume of traffic among nodes, we set the number of operations carried out for each round to S = 100,000. We have performed initial tests without masking and without any bandwidth limitation, in order to understand the behavior of the application in terms of amount of traffic over the network N C of Fig. 12, the time evolution of the overall bandwidth required and the time taken for the execution of a round. In Fig. 13 we can see the cumulative distribution function of packet lengths that are observed across the network N C . Most of the probability mass is concentrated around a couple of length values. Roughly 50% of the packet lengths are around 50 bytes, while the remaining 50% has the maximum length allowed by the network configuration, namely 1500 bytes. This motivates the choice Llarge ¼ 1500 bytes made for the fixed output packet size of the large packets masking device queue. In Fig. 14, the bidirectional bandwidth demand on the edge router connection through the network N C is plotted as a function of time for the two test topologies. We have considered a computational round in which no constraint about the bandwidth offered by the network N C has been imposed. Therefore, the only limit is due to the speed at which the processor is able to transfer data between two virtual routers. Under these conditions, we have observed that a round has a duration of about 160 s.

We have separated the traffic due to packets with lengths less than the threshold h and that due to packets with lengths above the threshold. We have chosen h equal to 829 bytes.6 The large almost flat central part of the plot in Fig. 13 highlights that the obtained performance results of the two buffer masking devices are weakly sensitive to the specific chosen value of h, provided it is not close to the extremes of the length range. In particular, in Fig. 14(a) and (b) we can compare the temporal evolution of the bandwidth required by small packets for the two topologies, while in Fig. 14(c) and (d) there is the same comparison done on packets with lengths above the threshold h. The evolution of the bandwidth used on the network N C , for the topology 1, highlights that we have a more uniform behavior during the round compared to the second topology, in which one can see a time interval greater than 50 s when no messages are exchanged through the network N C . That time interval corresponds to an application phase of a round when the PPs perform their computation on the shares, so messages are exchanged only among PPs. On the other hand, when there is traffic carried by the network N C , the scenario of the topology 2 results in a greater bandwidth demand than the one of the topology 1. In the next two subsections we evaluate and compare the performance obtained for the two masking systems with single and double buffer described in Section 4.

5.2. Performance for single buffer masking We test the behavior of the framework emulating the single buffer masking device, applied to the distributed system for multiparty computation, by varying two parameters: the packet length L0  H þ Y 0 and the inter-packet time U 0 of the packets outgoing into the encrypted tunnel. We consider four values for the packet length: 100, 500, 1000 and 1500 bytes. To evaluate the cost in terms of overhead and delay due to masking, we measure the average execution times of a 6 Given the fixed sizes of output IP packets, namely Lsmall ¼ 100 bytes and Llarge ¼ 1500 bytes; Z ¼ 829 bytes is the largest length value such that fragmenting an IP packet of length Z to the fixed length Lsmall introduces less overhead than padding it to Llarge .

13

A. Iacovazzi, A. Baiocchi / Computer Networks 77 (2015) 1–17

Cumulative Density Function

1

0.8

0.6

0.4

0.2

0 0

500

1000

1500

Input packet length [bytes] Fig. 13. Cumulative distribution function of the packet lengths observed across the network N C .

0.6

0.6

0.5

0.5

Rate [Mbit/sec]

Rate [Mbit/sec]

round and the average overhead introduced per round. For comparison, the execution time has been computed in the absence of masking as well, and under the same

bandwidth constraints. With respect to the analysis in Section 3, where the average packet delay is evaluated, here we account for the time required to complete an application round. Let T mask be the average execution time of a round in the case of masking, and T nomask be the same time in the case without masking. Similarly, we define Smask to be the average amount of traffic (in bit/s) carried over network N C tunnels when masking is applied, whereas Snomask is the same quantity with no masking. In Fig. 15 we plot the time ratio T mask =T nomask as a function of the output bit rate C ¼ L0 =U 0 of the routers Renc1=2 . The two graphics in Fig. 15 show that, for both topologies, the time ratio asymptotically tends to 1 as C increases. The traffic ratio gets closer to 1 also the bigger the output packet length L0 is. Higher values of packet length give better performance in term of overhead as well, as one can see in Fig. 16, where the traffic ratio Smask =Snomask is plotted as a function of the output bit rate C. There is clearly a trade-off between delay and overhead: as C grows, the time ratio tends to 1, while the traffic ratio grows. While it appears that time ratio levels not far from 1 can be readily achieved, the traffic ratio never drops

0.4 0.3 0.2 0.1

0.3 0.2 0.1

50

100

0 0

150

100

Time [sec]

(a)

(b)

20

20

15

15

10

5

0 0

50

Time [sec]

Rate [Mbit/sec]

Rate [Mbit/sec]

0 0

0.4

150

10

5

50

100

150

0 0

50

100

Time [sec]

Time [sec]

(c)

(d)

150

Fig. 14. Evolution of the bandwidth usage on the network N C for the topologies 1 (on the left side) and 2 (on the right side). Top plots refer to traffic due to packets with length less than h, bottom plots refer to the rest of the traffic (packet with length greater than or equal to h). The threshold h is set to 829 bytes.

14

A. Iacovazzi, A. Baiocchi / Computer Networks 77 (2015) 1–17

Fig. 15. Performance of masking in case of single queue: duration ratio vs. output bit rate.

below about 2.7, i.e., quite a substantial overhead can be expected with a single buffer masking device. The tradeoff between traffic ratio and time ratio is plotted in Fig. 17 for the two test topologies. 5.3. Performance for double buffer masking Starting from the bandwidth required in the network N C , in order to perform the distributed computing, we set the two packet lengths Llarge ¼ 1500 bytes; Lsmall ¼ 100 bytes, while the output bit rate for small packets C small ¼ 100 kbit=s at the routers Renc1=3 . As for single buffer experiments, we measured the average execution times of a round and the average overhead introduced per round. For comparison, the execution time was computed in the absence of masking as well, and under the same bandwidth constraints. The tests were performed by varying the output bit rate C large of the routers Renc2=4 . In Fig. 18(a) we plot the ratio T mask =T nomask as a function of the output bit rate C large of the routers Renc2=4 .

The graph shows that, for both topologies, the duration ratio asymptotically tends to 1 with the increase of the bandwidth. It is interesting to see how the curve relative to topology 2 is always well above the curve of topology 1, for which the time ratio tends to 1 faster. Fig. 18(b) shows the trend of the ratio Smask =Snomask by varying the output bit rate C large . In this graph we can see that, for both topologies, it is necessary to at least double the volume of traffic injected into the network with respect to the volume produced by the application. The two curves in Fig. 18(b) are concave. They attain the minimum of the average overhead at C large ¼ 200 kbit/s for the topology 1 and at C large ¼ 1200 kbit/s for the topology 2. This concave behavior departs from the prediction of the analytical model. In fact, the traffic offered to the masking device in the analytical model is open loop, i.e., its characteristics do not change as the masking device parameters are varied. With the full blown application of our testbed, messages are generated in response to delivered messages, according to the evolution of the state machine of the

Fig. 16. Performance of masking in case of single queue: traffic ratio vs. output bit rate.

A. Iacovazzi, A. Baiocchi / Computer Networks 77 (2015) 1–17

Fig. 17. Performance of masking in case of single queue: trade-off between traffic ratio and duration ratio.

Fig. 18. Performance of masking in case of double queue.

15

16

A. Iacovazzi, A. Baiocchi / Computer Networks 77 (2015) 1–17

application entities. The delay inserted by the masking device induces further delays in the opposite direction in the data exchange between two peers. The time gaps in the application dialog pattern are filled with dummy packets by the masking device. As the allowed output capacity of the masking device is increased, sending of messages is sped up and the dead times of the application entities are reduced. Hence, less dummy packets are required and this beneficial effect more than compensates the increased overhead due to the bigger allowed output bandwidth. Fig. 18(c) shows the trade-off between traffic ratio and time ratio. A time ratio level close to 1 can be obtained at the price of very large overhead values. For a time ratio up to 1.2 for the topology 1, and up to 1.8 for the topology 2, the dual buffer masking device leads to a traffic ratio level of 2.4. We get the minimum values of the traffic ratio at 1.3 and 2 respectively. This is not as bad a result, considering that confidentiality protection is essentially complete, the only leaked information being the fraction of the overall input traffic consisting of small packets. By comparing the trade-off of Fig. 18(c) with that of Fig. 17 the improvement entailed by this small leakage can be appreciated.

itself. We emulate the full blown networking scenario and protocol stack by using virtual machines interconnected via physical (when hosted in different physical machines) or logical (when in the same physical machine) network interfaces. Both the single buffer and double buffer masking devices have been implemented. In the second case, the information leakage on the original traffic amounts to the knowledge of the average fraction of ‘‘small’’ packets (whose lengths are below a given threshold). It is shown that traffic masking can be attained for the considered application by accepting a increase of the traffic volume of a factor about 2.4 and an increase of the application-level task completion time of about 30%. Hence, (almost) full confidentiality appears to be more appealing for contexts where delay constraints are more valuable than bandwidth. As a next step, we envisage to implement the double buffer masking device in the OpenSSL suite and to experiment with the prototype implementation so as to understand the realized quality of experience for various applications.

References 6. Conclusion We address the protection of the confidentiality of traffic flows in a packet network against traffic analysis. An effective, packet level approach to confidentiality protection has been illustrated: it is based on resampling packet sizes and inter-packet gap times from probability distributions that are independent of the original packet sizes and inter-packet gap times. As shown in previous works, the optimal masking device for full masking consists in a traffic shaper that outputs fixed size packets at fixed times. Optimality consists in minimizing the average packet delay under a constraint on the output average overhead fraction. Overhead is due to packet fragmentation, padding and dummy packets. All operations entailed by the masking device can be readily performed in standard protocols, e.g., fragmentation and reassembly is natively provided by IP. We analyze the proposed optimal masking device, made up of a single buffer, to find the delay-overhead trade-off. We highlight the performance significant advantage that can be gained on that trade-off if some information on the original traffic flow is leaked by giving up to full masking. This can be done by using a double masking device, using two buffer, to exploit the typical bimodal probability distribution of packet lengths. The flow to be protected is split into packets having length above a given threshold and those with length no more than the threshold. The two sub-streams are dealt with as in the single buffer masking device, while the overall output bandwidth is split to serve the two buffers. A detailed analysis is provided by using a queuing model, to gain insight into the double buffer masking device. Then, we test our traffic masking approach on a secure multiparty computation distributed application, where confidentiality requirements are key to the application

[1] J.-F. Raymond, Traffic analysis: protocols, attacks, design issues, and open problems, in: Workshop on Design Issues in Anonymity and Unobservability, 2000, pp. 10–29. [2] O. Berthold, H. Federrath, M. Köhntopp, Project anonymity and unobservability in the Internet, in: Proceedings of the Tenth Conference on Computers, Freedom and Privacy: Challenging the Assumptions, ACM, New York, NY, USA, 2000, pp. 57–65. [3] T.T.T. Nguyen, G.J. Armitage, A survey of techniques for internet traffic classification using machine learning, IEEE Commun. Surv. Tutorials 10 (1–4) (2008) 56–76. [4] H. Kim, K. Claffy, M. Fomenkov, D. Barman, M. Faloutsos, K. Lee, Internet traffic classification demystified: myths, caveats, and the best practices, in: Proceedings of the 2008 ACM CoNEXT Conference, CoNEXT ’08, ACM, New York, NY, USA, 2008, pp. 11:1–11:12. [5] A.C. Callado, C.A. Kamienski, G. Szabo, B.P. Gero, J. Kelner, S.F.L. Fernandes, D.F.H. Sadok, A survey on internet traffic identification, IEEE Commu. Surv. Tutorials 11 (3) (2009) 37–52. [6] M. Mellia, A. Pescapè, L. Salgarelli, Traffic classification and its applications to modern networks, Comput. Networks 53 (6) (2009) 759–760. [7] Y.-s. Lim, H.-c. Kim, J. Jeong, C.-k. Kim, T.T. Kwon, Y. Choi, Internet traffic classification demystified: on the sources of the discriminative power, in: Proceedings of the 6th International COnference, Co-NEXT ’10, ACM, New York, NY, USA, 2010, pp. 9:1–9:12. [8] A. Dainotti, A. Pescapè, K.C. Claffy, Issues and future directions in traffic classification, IEEE Network 26 (1) (2012) 35–40. [9] B. Miller, L. Huang, A. Joseph, J. Tygar, I Know Why You went to the Clinic: Risks and Realization of Https Traffic Analysis, arXiv preprint arXiv:1403.0297. [10] D. Herrmann, K.-P. Fuchs, H. Federrath, Fingerprinting techniques for target-oriented investigations in network forensics, in: Sicherheit, 2014, pp. 375–390. [11] A. Hintz, Fingerprinting websites using traffic analysis, in: Privacy Enhancing Technologies, 2002, pp. 171–178. [12] Q. Sun, D.R. Simon, Y.-M. Wang, W. Russell, V.N. Padmanabhan, L. Qiu, Statistical identification of encrypted web browsing traffic, in: IEEE Symposium on Security and Privacy, 2002, pp. 19–30. [13] G.D. Bissias, M. Liberatore, D. Jensen, B.N. Levine, Privacy vulnerabilities in encrypted HTTP streams, in: Privacy Enhancing Technologies, 2005, pp. 1–11. [14] S. Chen, R. Wang, X.F. Wang, K. Zhang, Side-channel leaks in web applications: a reality today, a challenge tomorrow, in: IEEE Symposium on Security and Privacy, 2010, pp. 191–206. [15] C.V. Wright, L. Ballard, F. Monrose, G.M. Masson, Language identification of encrypted voip traffic: Alejandra y roberto or alice and bob? in: USENIX Security, 2007, pp. 1–12.

A. Iacovazzi, A. Baiocchi / Computer Networks 77 (2015) 1–17 [16] A.M. White, A.R. Matthews, K.Z. Snow, F. Monrose, Phonotactic reconstruction of encrypted VoIP conversations: hookt on fon-iks, in: IEEE Symposium on Security and Privacy, 2011, pp. 3–18. [17] A. Das, P. Pathak, C.-N. Chuah, P. Mohapatra, Contextual localization through network traffic analysis, in: Proc. of IEEE INFOCOM 2014, 2014, pp. 925–933. [18] A. Back, U. Möller, A. Stiglic, Traffic analysis attacks and trade-offs in anonymity providing systems, in: Information Hiding, 2001, pp. 245–257. [19] K.P. Dyer, S.E. Coull, T. Ristenpart, T. Shrimpton, Peek-a-boo, i still see you: why efficient traffic analysis countermeasures fail, in: IEEE Symposium on Security and Privacy, 2012, pp. 332–346. [20] C.V. Wright, S.E. Coull, F. Monrose, Traffic morphing: an efficient defense against statistical traffic analysis, in: Proceedings of NDSS, 2009, pp. 1–14. [21] A. Iacovazzi, A. Baiocchi, Optimum packet length masking, in: International Teletraffic Congress, 2010, pp. 1–8. [22] A. Iacovazzi, A. Baiocchi, Internet traffic privacy enhancement with masking: optimization and tradeoffs, IEEE Trans. Parallel Distrib. Syst. 25 (2) (2014) 353–362. [23] M.J. Freedman, R. Morris, Tarzan: a peer-to-peer anonymizing network layer, in: Proceedings of the 9th ACM Conference on Computer and Communications Security, CCS ’02, ACM, New York, NY, USA, 2002, pp. 193–206. [24] D.I. Wolinsky, H. Corrigan-Gibbs, B. Ford, A. Johnson, Dissent in numbers: making strong anonymity scale, in: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI’12, USENIX Association, Berkeley, CA, USA, 2012, pp. 179–192. [25] S. Le Blond, D. Choffnes, W. Zhou, P. Druschel, H. Ballani, P. Francis, Towards efficient traffic-analysis resistant anonymity networks, in: Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM, SIGCOMM ’13, ACM, New York, NY, USA, 2013, pp. 303–314. [26] S. Le Blond, D. Choffnes, W. Zhou, P. Druschel, H. Ballani, P. Francis, Towards efficient traffic-analysis resistant anonymity networks, SIGCOMM Comput. Commun. Rev. 43 (4) (2013) 303–314. [27] S. Yu, T. Thapngam, H.I. Tse, J. Wang, Anonymous web browsing through predicted pages, in: IEEE Globecom Workshops, 2010, pp. 1581–1585. [28] S. Yu, G. Zhao, W. Dou, S. James, Predicted packet padding for anonymous web browsing against traffic analysis attacks, IEEE Trans. Inform. Forensics Sec. 7 (4) (2012) 1381–1393. [29] X. Luo, P. Zhou, E.W.W. Chan, W. Lee, R.K.C. Chang, R. Perdisci, HTTPOS: sealing information leaks with browser-side obfuscation of encrypted flows, in: Proceedings of NDSS, 2011, pp. 1–20. [30] F. Zhang, W. He, X. Liu, Defending against traffic analysis in wireless networks through traffic reshaping, in: ICDCS, 2011, pp. 593–602. [31] P. Venkitasubramaniam, T. He, L. Tong, S.B. Wicker, Toward an analytical approach to anonymous wireless networking, IEEE Commun. Mag. 46 (2) (2008) 140–146. [32] A. Iacovazzi, A. Baiocchi, Investigating the trade-off between overhead and delay for full packet traffic privacy, in: IEEE International Conference on Communications (ICC), 2013, pp. 1345–1350.

17

[33] D. Gross, J.F. Shortle, J.M. Thompson, C.M. Harris, Fundamentals of Queueing Theory, fourth ed., Wiley-Interscience, New York, NY, USA, 2008. [34] J. Ahrenholz, Comparison of CORE network emulation platforms, in: Military Communications Conference, 2010 – MILCOM 2010, 2010, pp. 166–171. [35] P. Biondi, Scapy. . [36] M. Burkhart, M. Strasser, D. Many, X.A. Dimitropoulos, Sepia: privacy-preserving aggregation of multi-domain network events and statistics, in: USENIX Security Symposium, 2010, pp. 223–240. [37] M. Burkhart, Sepia. .

Alfonso Iacovazzi received his MSc Degree in Telecommunication Engineering from Sapienza University of Rome, Italy, in 2008, and his PhD degree in Information and Communications Engineering from the same University, in 2013. Since March 2013 he is a Postdoctoral Research Fellow at DIET Dept, Rome, Italy. He is part of the Networking Group. His main research interests are about communications security and privacy, traffic analysis and monitoring, traffic anonymization, cryptography (mathematical aspects and applications).

Andrea Baiocchi received his Laurea degree in Electronics Engineering in 1987 and his PhD degree in Information and Communications Engineering in 1992, both from the University of Roma ‘‘La Sapienza’’. Since January 2005 he is a Full Professor in the Department of Information Engineering, Electronics and Telecommunications of the University of Roma ‘‘Sapienza’’. The main scientific contributions of Andrea Baiocchi are on traffic modeling and traffic control in ATM and TCP/IP networks, queueing theory, radio resource management, MAC protocols, traffic analysis for flow classification. His current research interests focus on wireless access protocols, specifically for VANET applications, and on protection of traffic privacy against classification adversaries. Andrea’s research activities have been carried out also in the framework of many national (CNR, MIUR) and international (European Union, ESA) projects, also taking coordination and responsibility roles. Andrea has published more than a hundred papers on international journals and conference proceedings, he has participated in the Technical Program Committees of more than forty international conferences; he also served in the editorial board of the telecommunications technical journal published by Telecom Italia for ten years.