Protecting traffic privacy for massive aggregated traffic

Computer Networks 77 (2015) 1–17 Contents lists available at ScienceDirect Computer Networks journal homepage: www.elsevier.com/locate/comnet Prote...

Download PDF

2MB Sizes 1 Downloads 89 Views

Report

PDF Reader
Full Text

Computer Networks 77 (2015) 1–17

Contents lists available at ScienceDirect

Computer Networks journal homepage: www.elsevier.com/locate/comnet

Protecting trafﬁc privacy for massive aggregated trafﬁc Alfonso Iacovazzi ⇑, Andrea Baiocchi DIET, Sapienza University of Rome, Via Eudossiana 18, 00184 Rome, Italy

a r t i c l e

i n f o

Article history: Received 7 June 2014 Received in revised form 10 November 2014 Accepted 28 November 2014 Available online 4 December 2014 Keywords: Privacy Trafﬁc masking Trafﬁc ﬂow classiﬁcation Padding Fragmentation Internet trafﬁc

a b s t r a c t Trafﬁc analysis has deﬁnitely shown that encryption is not enough to protect the privacy of communications implemented over packet networks. The very features of packet trafﬁc, like packet lengths statistics, inter-packet times, volumes of exchanged trafﬁc, communication patterns, leak information. Leakage ranges from the kind of application that generates the information ﬂow carried into the supposedly secure connection to parts of its content. We propose trafﬁc masking as a countermeasure. Full conﬁdentiality protection is discussed and the trafﬁc masking framework is introduced and motivated. The optimization and performance assessment of the masking device is evaluated both through a general analytical model, mainly useful to gain basic insight, and by a real network emulation of a distributed secure multiparty computation application, where conﬁdentiality requirements are key to the application itself. It is shown that essentially full conﬁdentiality can be attained for a practical distributed security application by accepting an increase of the trafﬁc volume by a factor 2.4 and an increase of the task completion time of 30%. Hence, (almost) full privacy appears to be more appealing for contexts where delay constraints are more valuable than bandwidth. Ó 2014 Elsevier B.V. All rights reserved.

1. Introduction The information content within isolated messages exchanged between two parties can be safeguarded by means of encryption, but it is much harder to protect sender and/or receiver conﬁdentiality in a broader sense, when exchanging a structured ﬂow of packetized information. Trafﬁc analysis is an example of how to exploit sidechannel information leakage and infer as much knowledge as possible on the observed message ﬂow. In [1] Raymond provides a comprehensive overview of possible attacks that can be carried out using trafﬁc analysis. As also pointed out in [2], the main trafﬁc analysis attacks exploit information regarding timing, communication patterns, ⇑ Corresponding author. E-mail addresses: [email protected] [email protected] (A. Baiocchi). http://dx.doi.org/10.1016/j.comnet.2014.11.019 1389-1286/Ó 2014 Elsevier B.V. All rights reserved.

(A.

Iacovazzi),

packet counting/volume, on-line/off-line periods in the communication. These types of attacks have been known for decades, and there are numerous studies that have shown their effectiveness in various networking scenarios. Overall, it appears that tools for packetized information ﬂow conﬁdentiality protection still fall short of what is required. Among the different conﬁdentiality breaking attacks, Trafﬁc Classiﬁcation aims at identifying the type of application protocol in a given set of possibilities. Several approaches are used to achieve this goal, e.g., see the surveys and review works [3–8] and references therein. Other type of conﬁdentiality breaking of supposedly end-to-end secure channels have been demonstrated via statistical analysis using side-channel features. We can ﬁnd in literature several researches, e.g., [9–13], dealing with the issue of web pages ﬁngerprinting. These works point out that downloaded web pages can be identiﬁed even if an encrypted and authenticated channel is used.

2

A. Iacovazzi, A. Baiocchi / Computer Networks 77 (2015) 1–17

Identifying sensitive information in healthcare and ﬁnance [14], recognizing the language of the conversation in encrypted VoIP transactions [15], getting partial transcripts of an encrypted VoIP conversation when the underlying audio is encoded using Variable Bit Rate codecs [16] and prediction of users contextual location [17] are further examples of how one can steal information by exploiting side-channel leakage due to packet lengths, inter-packet times, packet directions, communication patterns. In order to thwart trafﬁc analysis, several research groups have explored techniques with the aim of hiding side-channel information. An early contribution is [18], where models and trade-offs for anonymity providing algorithms are discussed for application ﬂows with low latency requirements. Adding a random amount of padding to each packet of a ﬂow is the main countermeasure exploited in secure channel protocols such as SSH, TLS and IPSec, but it can be readily veriﬁed that this method does not really lead to an effective masking of revealing trafﬁc features [19]. Convex optimization techniques are used in [20] in order to modify the original packet lengths distribution to look like a target distribution (morphing), with minimum overhead. In [21] the optimum padding for masking only packet lengths is found, while in [22] the same Authors explore minimum overhead trafﬁc masking algorithms for several features and the trade-off between trafﬁc conﬁdentiality protection and masking cost. The algorithms developed in these works make a decision on fragmentation and padding based on a priori knowledge of statistics of the trafﬁc that one wants to protect. This implies that the masking device must have sufﬁciently large and quickly accessible memory spaces. A line of literature has developed over anonymous communications issues, whose purpose is to guarantee (at some degree) the unlinkability property of packets and users, i.e., it must be not feasible to associate a packet captured in the network to a speciﬁc user and viceversa. Contributions in this context are [23–26]. In [23] intermediate relay nodes are used as well as cover trafﬁc to anonymise user communications. Operations is carried out at IP level. Tunnels are created according to a uniform topology of relay nodes. Peer and route selection is done so as to guarantee that connections among peers are independent as for topology and carried trafﬁc intensity of the actual user communication patterns. The DISSENT client–server architecture is proposed in [24], where client users are provided anonymity as long as there is at least one honest server in the set they select, yet they need not know which one is honest. The major drawback of the approach of [24] is scalability in the face of the sophisticated cryptographic tools and related required processing that are used in DISSENT. In [25,26] authors propose a peer-to-peer, low latency anonymous network that is resistant to trafﬁc analysis to overcome limitations of previous architectures. Besides providing a useful classiﬁcation and review of many previous contribution, those works propose the Aqua architecture, based on ‘circuits’ of relay nodes, able to provide sender and receiver anonymity. Again, key to Aqua architecture is trafﬁc obfuscation obtained by means of cover trafﬁc among the relay nodes.

Other works focus on privacy protection of speciﬁc applications. In order to achieve anonymity against webpages ﬁngerprinting, Yu et al. [27,28] implemented a new strategy of packet padding aiming at offering perfect anonymity on web browsing. Their solution allows to reduce overhead by exploiting web pages that the user is predicted to download in the future as padding. Luo et al. [29] propose to modify statistical features of a trafﬁc ﬂow by using a set of transformations both at the application and at the transport layer (e.g., packet padding and/or fragmentation, HTTP Range, TCP Maximum Segment Size (MSS) negotiation, modiﬁcations of the value of TCP advertising window, HTTP pipelining, etc.). These techniques are designed only for web browsing ﬂows and cannot be applied to the trafﬁc generated by other applications. A trafﬁc reshaping technique is used by Zhang et al. [30]. In their method an application ﬂow is dynamically split in a set of new ﬂows and then dispatched among multiple virtual MAC interfaces. Trafﬁc features are reshaped on each virtual interface to hide those of the original trafﬁc. In [19] authors give a thorough review of some practical conﬁdentiality preserving techniques against trafﬁc analysis highlighting their main weaknesses. Despite efforts made in recent years to hide side-channel information in security protocols, Dyer et al. show that it is still possible to classify trafﬁc ﬂows after masking, even though they have only looked at some of the techniques known in the literature. We address the protection of the conﬁdentiality of trafﬁc ﬂows in a packet network against trafﬁc analysis, by introducing trafﬁc masking on the whole packet ﬂow transferred between two given endpoints. As a matter of example, this could be either the ﬂow belonging to an SSL/TLS connection or the aggregated trafﬁc carried through an IPSec tunnel. Our proposed trafﬁc masking is based on resampling packet sizes and inter-packet times from probability distributions that are independent of the original packet sizes and inter-packet times. We deﬁne an analytical model to gain insight into the performance trade-offs of the trafﬁc shaper (masking device). Then, we apply our approach to a secure multiparty computation application, in a real network emulation testbed. We show that a performance beneﬁt can be gained by trading off a limited amount of leakage against overhead and delay introduced by the masking device. Unlike other works, the approach explored in our study is particularly suitable for aggregated ﬂows, in which we observe massive volumes of trafﬁc exchanged, although it can be independently applied to individual trafﬁc streams. Moreover, it minimizes the required processing, thus being viable also in cheap, low end networking equipment. Speciﬁcally, we claim the following main contributions: we provide models to gain understanding into overhead-delay trade-off of tunnels that provide full trafﬁc obfuscation; we propose and optimize a dual buffer strategy to improve the overhead-delay trade-off; we prove the effectiveness of our framework under a real network application trafﬁc.

A. Iacovazzi, A. Baiocchi / Computer Networks 77 (2015) 1–17

3

ysis attacks on the packet ﬂow between A and B. We refer to this kind of protection as trafﬁc masking. Network

Network

2.1. Adversary model and security metric

1 The concept is the same in case more than one tunnel is established between N A and N B at a given time, by just referring to each tunnel separately.

2 The tagged tunnel between A and B can also be a part of a protected path through several subnetworks, made up of a cascade of protected tunnels.

Network The reference networking scenario is illustrated in Fig. 1. It comprises two subnetworks N A and N B which are connected by means of an insecure network N C . The trafﬁc exchanged between the two subnetworks N A and N B is conveyed via a secure tunnel (encrypted and authenticated), established between two edge nodes A 2 N A and B 2 N B , e.g., IPsec tunnel, or SSL/TLC connection or SSH connection.1 We focus on countermeasures against trafﬁc anal-

We assume a passive adversary, having full access to the network N C . The adversary can observe the trafﬁc carried through N C , speciﬁcally the trafﬁc exchanged between the two edge nodes A and B. The adversary has only access to trafﬁc carried over N C . The edge subnetworks N A and N B are either trusted or protected against trafﬁc analysis.2 As a matter of example, a secure VPN tunnel connecting two corporate private networks through the public Internet ﬁts into this scenario. The adversary is able to perform trafﬁc analysis on the packet ﬂow exchanged between A and B, aiming at any of the target cited in Section 1. As a matter of example, the adversary can analyze the features of the observed trafﬁc, such as packet lengths, inter-packet gap times, the volume of trafﬁc exchanged over time. This data can be used to gather information about the activity of one or more hosts, e.g., to identify the application generating some portion of the trafﬁc or to (at least partially) break the conﬁdentiality of the exchanged information. We assume the edge nodes A and B are trusted and cannot be manipulated by the adversary, as typical of security function provided by means of a tunnel through an insecure network. Attacks on tunnel establishment (including key establishment), tunnel encryption or authentication are out of the scope of this work as well as DoS attacks on the edge nodes. All of these attacks can be contained by resorting to cryptographic primitives or known DoS contrasting means. Let us consider the two endpoints A and B, within the subnetworks N A and N B respectively, and a packet ﬂow exchanged between them. For ease of language, the term packet is used to refer to data units of the trafﬁc ﬂow, even if they could belong to a layer different from network one. Let X k and T k be the lengths of the k-th packet and the time elapsing between the ðk 1Þ-th and the k-th packets (k-th inter-packet gap time) as measured at an edge node. We use a superscript ðAÞ or ðBÞ to distinguish quantities related to the trafﬁc ﬂow originated from the endpoint A or B, respectively. Since, we focus on the direction A ! B, we omit the superscript ðAÞ when referring to variables of the trafﬁc coming from A, provided there can be no ambiguity. We assume both X’s and T’s can be modeled as drawn from wide-sense stationary processes, so that for any k we have X k X and T k T, with X and T being random variables with at least ﬁnite ﬁrst two moments. We observe that essentially all trafﬁc analysis attacks use in some way information derived from the sequences ðAÞ ðAÞ ðBÞ ðBÞ fX k gk2Z ; fT k gk2Z ; fX k gk2Z and fT k gk2Z as input, plus any prior knowledge that the adversary might have on subnetworks N A and N B . As a matter of example, classiﬁcation of application protocols exploits packet lengths and

Fig. 1. Scheme of the reference networking scenario.

The third point is a key added value with respect to customary trace-driven simulations, where performance of trafﬁc masking algorithms is tested by feeding open-loop packet trafﬁc streams, possibly collected in operational networks. Since trafﬁc masking algorithms act upon the trafﬁc in transit and they introduce delay, there is a closed loop interaction between trafﬁc masking the evolution of the application level state machine, hence the trafﬁc offered to the masking algorithm is modiﬁed by the insertion of the masking algorithm itself. We account for this cross-layer interaction by developing a full-blown application and networking test environment. Our results can be useful to design secure VPNs with augmented conﬁdentiality levels. The trafﬁc masked tunnel we deﬁne is also a key building block of anonymity networks, as pointed out, e.g., in [23,25]. We provide an approach to realize those trafﬁc obfuscated channels and to improve their overhead-delay trade-off. In the rest of this paper we state the trafﬁc conﬁdentiality model in Section 2. Section 3 is devoted to an analytical model to gain insight into the design of the trafﬁc masking device. A case study is analyzed in Section 4, by focusing on a real application, a secure multiparty computation one, in a real network emulation testbed. Trafﬁc masking performance are discussed for the single and double buffer masking algorithms in Section 5. Finally, conclusions are drawn in Section 6. 2. Protection against trafﬁc analysis with trafﬁc masking

4

A. Iacovazzi, A. Baiocchi / Computer Networks 77 (2015) 1–17

time gaps of the packets making up each application session. This motivates the formal deﬁnition of our security objective: it is a measure of how much the adversary can learn about the packet lengths and gap times, given the observed information leaked by the packet ﬂow running between A and B. ðAÞ ðAÞ Let UðAÞ fX k ; T k g16k6m be the set of packet lengths A and gap times that refer to the packets carried from A to B through the tunnel in a given observation time interval; ðBÞ

ðBÞ

let UðBÞ fX k ; T k g16k6mB be the analogous set of random

Endpoint A

X3

X2

network

X1

T2

Endpoint B

Traffic masking device

Y1

Y3

Y2

Y4

U2

Fig. 2. Sketch of the end-to-end connection with trafﬁc masking: deﬁnition of in and out inter-packet gap times and packet length.

variables for the B to A direction in the same observation time. Let also U ¼ fUðAÞ ; UðBÞ g. The endpoint A and B transform the trafﬁc to be carried into the tunnel by means of a trafﬁc obfuscation function. The metric related to the trafﬁc analysis attack can be deﬁned after [31] by using the conditional uncertainty on U given: (i) side information W on U, known to the adversary; (ii) leakage information X on U, gained by the adversary by means of the observed trafﬁc. The degree of trafﬁc masking is measured by the normalized equivocation E:

E¼

HðUjW; XÞ HðUjWÞ

ð1Þ

where HðZÞ is the average entropy of the random variable Z. Perfect trafﬁc obfuscation, given the context, corresponds to E ¼ 1. In that case, the observed leakage X does not yield any additional information about U than that provided by the prior probability distribution of U assumed by the adversary, given what the adversary already knows, i.e., W. 2.2. Trafﬁc masking A general scheme of the masking device at either edge node is given in Fig. 2. The A–B connection depicted in this ﬁgure could represent a tunnel between IPSec security gateways or between two application endpoints of an SSL/TLS connection or between two mixes of an anonymity network, according to the architectures presented in, e.g., [23,25]. Let Y k and U k be sequences of packet lengths and interpacket time intervals, chosen for the masked trafﬁc ﬂow. The output packet departure times are denoted as td;k and t d;k ¼ td;k1 þ U k . The masking device shapes the input original trafﬁc ﬂow so as to impose that the output ﬂow kth packet has payload length Y k and it is sent at time td;k . The logical block structure of the packet masking device is shown in Fig. 3. First, the input packet with length X is broken into N new packets of lengths Y 1 ; . . . ; Y N , whose lengths are drawn from the output packet length probability distribution function f Y ðÞ. The number of output fragments, N is the least integer satisfying both Y 1 þ þ Y N1 < X and Y 1 þ þ Y N P X. The bytes of the input packet are carried by the newly generated fragments. A header of length H is added to each fragment to form the output packet. The header carries the information required to reassemble the original packet at the other end of the secure channel. Those new packets are enqueued in a FIFO buffer. At output departure time td;k , a packet is taken from the buffer and sent to the output. If the buffer is empty, a

Dummy packet buffer in

out

Packet length masking Data buffer

Fig. 3. Masking device with packet length and gap times masking.

dummy packet is sent to the output, again with length taken from the probability density function (PDF) f Y ðÞ. Overhead sources are fragmenting, padding and dummy packets. The masking device can be optimized by choosing the PDFs f Y ðÞ and f U ðÞ so as to minimize the expected delay through the device for a given fraction of overhead in the output packet ﬂow. In [32] it is proved the following result: given that original packet lengths and gap times are replaced by random variables independent of the original ones, the optimal masking device consists in a trafﬁc shaper that outputs ﬁxed size packets at ﬁxed times. The masking device can be practically implemented at different architectural levels, ranging from packet level to application level, depending on the scope and the span of the protection. Reduction of packet lengths to the desired value can be obtained by means of fragmentation and padding, that can be readily done, e.g., fragmentation and reassembly is natively provided by IP. Continuous emission at ﬁxed times can be realized, e.g., by using a leaky bucket logic, as already implemented in Linux OS, and a dummy packet generator. The redundancy introduced with dummy packets can be exploited to carry out a network coding of the covered packet ﬂow. In case of congestion, network routers can discard a fraction of the packets of the masked ﬂow, without hurting the integrity of the information ﬂow (but possibly impacting on the reconstruction delay of the network decoder). 3. The conﬁdentiality-efﬁciency trade-off of the masking device In this section we introduce a model of the masking device simple enough to lend itself to a thorough analysis, so as to understand: (i) the overhead-delay trade-off; (ii) how that trade-off can be improved by leaking some controlled information on the input trafﬁc ﬂow. The values of the masking device parameters are found by minimizing the mean delay through the device D under the constraint that the average overhead fraction g of the output link be given.

5

A. Iacovazzi, A. Baiocchi / Computer Networks 77 (2015) 1–17

Let us consider a masking device with output link capacity C, receiving an input packet ﬂow described by inter-arrival times T k and packet lengths X k . We also let k 1=E½T. 3.1. Single buffer trafﬁc masking device

kE½XU 0 H þ Y0

ð2Þ

Given the ﬁxed payload length and ﬁxed inter-packet gap time of output packets, the single buffer masking device can be modeled as a time slotted, batch arrival, deterministic service time queue. If we assume that input packets arrive according to a Poisson process of mean rate k and that input packet lengths are i.i.d. random variables X k X, the resulting queueing model is a slotted time M ½x =D=1 queue, with service time U 0 ; batch size deﬁned by the random variable S ¼ dX=Y 0 e; output time axis slotted with slot duration U 0 . For a clear exposition of the model we refer to output ﬁxed length packets as fragments, since they spring off input packets by means of fragmentation and padding processes. The output capacity C can be expressed as

C¼

H þ Y 0 kE½X ¼ 1g U0

ð3Þ

For the statistical equilibrium of the queue to exist it is necessary and sufﬁcient that kE½SU 0 < 1. Given that an input packet arrives after a time H ¼ x of the current output slot start time, its sojourn time in the masking device, DðxÞ, conditional on the event fH ¼ xg, is given by

DðxÞ ¼ U 0 x þ AðxÞU 0 þ NU 0 þ SU 0

ð4Þ

where AðxÞ is the number of fragment arrivals during x; N is the number of fragments found in the masking queue at the beginning of a slot, and S is the number of fragments generated by the arriving input packet. If MðxÞ denotes the number of input packet arrivals in a time x, we have

AðxÞ ¼

MðxÞ X Sj

ð5Þ

j¼1 k

and PðMðxÞ ¼ kÞ ¼ ðkxÞ ekx , for k P 0. Hence, we get E½AðxÞ k! ¼ E½MðxÞE½S ¼ kxE½S and

E½DðxÞjH ¼ x ¼ U 0 x þ kxE½S þ E½NU 0 þ E½SU 0

ð6Þ

By removing the conditioning, since H is uniformly distributed over ½0; U 0 , we obtain

E½D ¼

U0 1 2 þ kU E½S þ E½NU 0 þ E½SU 0 2 2 0

E½N ¼

2 kU 0 E½S2 E½S þ ðkU 0 E½SÞ 2ð1 kU 0 E½SÞ

ð8Þ

The ﬁnal result is

Let us ﬁrst consider a single buffer masking device according to the block scheme outlined in Fig. 3. The masking device shapes the trafﬁc by emitting packets with payload length Y 0 , header length H, and ﬁxed inter-packet gap time U 0 , so that C ¼ ðH þ Y 0 Þ=U 0 . The average overhead fraction is

g¼1

The expectation E½N can be found from the well known analysis of the batch arrival M=G=1 queue (e.g., see [33]):

ð7Þ

E½D ¼ U 0

kU 0 E½S2 1 þ U 0 E½S þ 2ð1 kU 0 E½SÞ 2

ð9Þ

Eq. (9) yields the mean delay through the masking device E½D as a function of U 0 and Y 0 . In the numerical examples, we normalize the expected delay through the masking device with the delay through a buffer served with the same output capacity C, but no masking. The corresponding model is but an M/D/1 queue, with utilization coefﬁcient q ¼ kE½X=C ¼ 1 g, so that the mean delay E½Dnomask through this reference queue is

" # 1 ð1 gÞ2 E½X 2 E½Dnomask ¼ 1gþ k g 2E½X2

ð10Þ

The normalized average delay is

D¼

ðkU 0 Þ2 E½S2 þ kU 0 E½S þ 12 E½D 2ð1kU 0 E½SÞ ¼ 2 2 E½Dnomask 1 g þ ð1gÞ E½X g

ð11Þ

2E½X2

From the constraint on the overhead fraction given in Eq. (3), we obtain the value of kU 0 ¼ ð1 gÞðH þ Y 0 Þ=E½X. By substituting this into the expression of D in Eq. (11), we obtain a function of the parameter Y 0 only for any given value of the overhead fraction g and of the input packet ﬂow statistics. This can be minimized under the constraint that the queue be stable, i.e., ð1 gÞðH þ Y 0 ÞE½S=E½X < 1. To obtain numerical results, we have estimated the packet length pdf from sample traces taken from CAIDA repository (www.caida.org). Fig. 4 shows the Probability Distribution Function (PDF) and the Complementary Cumulative Distribution Function of the random variable X obtained from the sample. The average and the standard deviation of X are E½X ¼ 550 bytes and rX ¼ 638 bytes, respectively. The shown distributions exhibit a pronounced concentration of the probability mass around small packet sizes (from few tens up to barely one hundred bytes) and large packet sizes, around the typical MTU of 1500 bytes. This is a common feature of packet size distribution in real networks. We set the output packet overhead to H ¼ 20 bytes. This is consistent with IP packet fragmenting, that could be used to implement the masking device at IP layer. The left plot of Fig. 5 shows the average normalized delay through the masking device as a function of Y 0 for three values of g. It is apparent that an optimal value of Y 0 exists, even if the uneven input packet length pdf and the fragmentation/padding functions introduced by the masking make the graph of E½D bumpy. The optimal value of Y 0 as a function of g is shown in the right plot of Fig. 5. It is apparent that few different values of the output packet length are good choices for the entire range of feasible values of g. All those values fall in a range between 93 and 148 bytes.

6

A. Iacovazzi, A. Baiocchi / Computer Networks 77 (2015) 1–17

Complementary cumulative distribution function

0

Probability distribution

10

−1

10

−2

10

−3

10

−4

10

−5

10

0

500

1000

1

0.8

0.6

0.4

0.2

0

1500

0

500

Input packet length [bytes]

1000

1500

Input packet length [bytes]

Fig. 4. Probability distribution function (left plot) and Complementary Cumulative Distribution Function (right plot) of the input packet length random variable X.

150

6 5 4 3 2

η=0.3 η=0.4 η=0.5

1 0 0

200

400

600

Optimal output packet payload length, Y0 [bytes]

Normalized average delay

7

140 130 120 110 100

Single buffer 90

0

0.2

0.4

0.6

0.8

1

Average overhead fraction, η

Output packet payload length, Y0 [bytes]

Fig. 5. Average normalized delay of the masking device vs. output packet payload size Y 0 (left plot) and optimal value of Y 0 as a function of g (right plot).

3.2. Double buffer trafﬁc masking device Motivated by the bimodal typical probability distribution of packet lengths, we investigate the improvement of the delay-overhead trade-off achievable if we allow the output packet ﬂow to take different lengths, instead of a single one, namely Y 0 . The masking device block diagram is shown in Fig. 6. The input packet ﬂow is split according to the packet length and sent to one of two buffers. The buffer output lines have capacity C 1 and C 2 , with C 1 þ C 2 ¼ C ¼ kE½X=ð1 gÞ; we let C 1 ¼ aC and C 2 ¼ ð1 aÞC, with a 2 ð0; 1Þ. Packets with X 6 h are sent to the small packet buffer, whose output is a slotted channel with ﬁxed slot size U 1 , ﬁxed output length Y 1 and capacity

kE½X H þ Y 1 C1 ¼ a ¼ 1g U1

ð12Þ

Packets with X > h are sent to the large packet buffer, whose output is a slotted channel with ﬁxed slot size U 2 , ﬁxed output length Y 2 and capacity

Dummy packet buffer X> in

X

Packet length masking Packet length masking

out Big packet buffer Small packet buffer

Fig. 6. Double buffer masking device scheme.

C 2 ¼ ð1 aÞ

kE½X H þ Y 2 ¼ 1g U2

ð13Þ

Let p1 ¼ PðX 6 hÞ and p2 ¼ 1 p1 . The average delay through this double buffer masking device is:

E½D ¼ p1 E½D1 ðh; a; Y 1 Þ þ p2 E½D2 ðh; a; Y 2 Þ where we derive from Eq. (9):

ð14Þ

7

A. Iacovazzi, A. Baiocchi / Computer Networks 77 (2015) 1–17

5

j ¼ 1; 2

Single buffer Double buffer

ð15Þ r

E½Sr1

r

h; E½Sr2

with ¼ E½dX=Y 1 e jX 6 ¼ E½dX=Y 2 e jX > h, for r ¼ 1; 2. In Eq. (14) we have emphasized that the delays through the small and large packet queues depend on the input length threshold h, the capacity split coefﬁcient a and on the choices of the ﬁxed output payload lengths Y 1 and Y 2 ; U 1 and U 2 can be expressed as functions of Y 1 and Y 2 respectively, by using Eqs. (12) and (13). The optimal average delay is: E½D ¼ min p1 minfE½D1 ðh; a; Y 1 Þg þ p2 minfE½D2 ðh; a; Y 2 Þg h;a

Y1

Y2

ð16Þ The optimization is carried out under the constraint that the two queues are stable, i.e., kp1 ðH þ Y 1 ÞE½S1 < C 1 and kp2 ðH þ Y 2 ÞE½S2 < C 2 . The normalized delay-overhead trade-off of the single and double buffer masking devices is compared in Fig. 7. Thanks to the bimodal probability distribution of the input packet lengths, it turns out that the double buffer masking device yields a signiﬁcantly more favorable trade-off, allowing lower overhead fraction for the same average delay. It appears that the most convenient range of g levels is between 0.2 and 0.5, where the double buffer masking device outperforms the single buffer one. For g levels beyond about 0.6 the optimization of the double buffer masking device leads to design a single buffer device actually, i.e., the threshold and capacity split are set so that all trafﬁc is diverted to the large packet buffer.3 The graphs in Fig. 8 show the optimal values of h (upper left plot), a (upper right plot), and of the output payload packet lengths Y 1 ; Y 2 , as a function of g (bottom plot). It is apparent that above about g ¼ 0:6 the optimization points out that a single buffer is best. As a matter of fact, splitting the output capacity between two buffers incurs multiplexing inefﬁciency, but this is more than compensated by the reduction of overhead implied by a tailored choice of the output packet payload lengths for small and large packets. When a large overhead is allowed the gain coming from the exploitation of the bimodal nature of the input packet PDF is outweighted by the multiplexing inefﬁciency (i.e., one queue can be idle while the other one is busy).

Normalized average delay

kpj U j E½S2j 1 þ U j E½Sj þ ; E½Dj ¼ U j 2 2 1 kpj U j E½Sj

4

3

2

1

0

0

0.2

0.4

0.6

0.8

1

Average overhead fraction, η Fig. 7. Trade-off between normalized average delay and output average overhead fraction for the single and double buffer masking devices.

buffer masking device and Y 1 ; Y 2 ; U 1 and U 2 for the double buffer masking device), some information on the original trafﬁc ﬂow leaks, namely, no more than the information on the packet lengths and gap times statistics that are used in the optimization procedure. The full information used in the optimization procedure consist of the average packet arrival rate k, the packet length PDF and the target level of overhead g. The adversary can set up equations involving those quantities by equating the known tunnel ﬁxed rate with the average input information rate increased according to the overhead level, namely

kpj E½X j H þ Yj ¼ Cj ¼ ; Uj 1g

j ¼ 1; 2

ð17Þ

with E½X 1 ¼ E½XjX 6 h and E½X 2 ¼ E½XjX > h. A similar expression is obtained in case of a single buffer. Other equations can be derived by setting to zero he derivatives of the target optimization function with respect to the optimization variables, i.e., Y 0 in case of the single buffer device, and a and h in case of the double buffer device. The extent of the leakage due to the optimized parameters is investigated in the next subsections. We denote the observed trafﬁc features as X1 fY 0 ; U 0 g and X1 fY 1 ; U 1 ; Y 2 ; U 2 g for the single and double buffer masking device respectively. We distinguish two different assumptions on the a priori knowledge of the adversary, namely no prior knowledge (W ¼ ;) and full prior knowledge (W ¼ fPDFðXÞ; k; gg).

3.3. Conﬁdentiality analysis Since the obfuscated trafﬁc ﬂow is made of ﬁxed length packets emitted at ﬁxed times continuously in both directions, no information on individual packet lengths or gap times can be gained by the adversary on the basis of the observed trafﬁc features X. Due to the optimization carried out in the selection of the masked trafﬁc parameters (Y 0 and U 0 for the single 3 Due to the equality sign in the split test, the least size input packets are still sent to the small packet buffer.

3.3.1. Adversary with no a priori knowledge We calculate the leakage metric E with reference to the packet length PDF only, i.e., U ¼ X. The PDF of the packet lengths X, with no a priori information, corresponds to the maximum entropy HðUÞ ¼ log2 nX , where nX is the number of possible packet length values. The equations that can be derived by the adversary once she observes the covered trafﬁc optimized parameters can be simpliﬁed by approximating the moments E½dX r =Y 0 e with E½X r =Y r0 . Even then, in case of single buffer masking device, we get two equations involving four unknowns,

A. Iacovazzi, A. Baiocchi / Computer Networks 77 (2015) 1–17

1000

0.2

Optimal output capacity fraction, α

Optimal input packet length threshold, θ [bytes]

8

800

600

400

200

0

0

0.2

0.4

0.6

0.8

0.15

0.1

0.05

0

1

0

0.2

Optimal output packet payload length [bytes]

0.4

0.6

0.8

1

Average overhead fraction, η

Average overhead fraction, η 4

10

Large packets Small Packets 3

10

2

10

1

10

0

0.2

0.4

0.6

0.8

1

Average overhead fraction, η Fig. 8. Optimal values of: (i) the input packet length threshold h (upper left plot); (ii) the small packet buffer output capacity fraction a (upper right plot); and (iii) the output payload packet lengths Y 1 and Y 2 (bottom plot) as a function of g.

i.e., g; k and the ﬁrst two moments of X. In case of the double buffer masking device we get four equations in the unknowns g; k; pth PðX 6 hÞ,4 and the ﬁrst two moments of the random variables X 1 ¼ XjX 6 h and X 2 ¼ XjX > h. In the following we upper bound the leakage by assuming that the adversary can anyway obtain information on the PDF of X, speciﬁcally up to the ﬁrst two moments and the threshold probability pth . As a matter of example, under the condition that E½X and E½X 2 are known, i.e., X fE½X; E½X 2 g, the PDF of the packet length X can be found according to the maximum entropy principle as 2 pk ¼ PðX ¼ kÞ ¼ 2ab1 kb2 k ; k ¼ 1; . . . ; nX . The constants a; b1 and b2 are found by the normalization condition and by imposing the known values of the ﬁrst two moments. The resulting entropy is HðUjXÞ ¼ a þ b1 E½X þ b2 E½X 2 . Similarly, we can deal with other known parameters. Fig. 9 plots E as a function of g for W ¼ ; and various assumptions on X. It is apparent that in case the adversary can only estimate the mean values of the packet length

4 As a matter of example, for 0:2 6 g 6 0:6 the optimal h is always close to about 200 bytes. Then, pth can be estimated by observing the ratio C 1 =ðC 1 þ C 2 Þ, the adversary gets a hint on the fraction of input packets that have lengths no larger than h .

then the realized leakage is minimum, especially in the range of practical values of g. The assessment of E must account the fact that getting full knowledge here (i.e., achieving E ¼ 0) means that the adversary gets to know exactly the packet length PDF thanks to the leakage of the protected tunnel, given that the adversary has no a priori knowledge. Even this extreme result is just what we assume as the a priori knowledge of the adversary in the next subsection. Even knowing the PDF of packet lengths the adversary cannot tell anything of the actual activity of the users communicating through the protected tunnel at any given time, nor can she tell whether there is activity at all. 3.3.2. Adversary with full knowledge of the offered trafﬁc parameters We assume here that the adversary knows the set of applications that users in networks A and B can use and knows statistics of the features of the trafﬁc produced by those applications (e.g., volumes of trafﬁc of each session, sequence of lengths of packets of each session, etc.). As a matter of example, if the adversary knows that most of the trafﬁc has to do with web surﬁng and access to ﬁle repositories through web interfaces, she can infer that

Normalized conditional entropy, E

A. Iacovazzi, A. Baiocchi / Computer Networks 77 (2015) 1–17

1 0.983 0.95

0.85 0.808 0.75

E[X]

0.65

E[X], E[X ] E[X1], E[X2] P(X<=θ) P(X<=θ), E[X] P(X<=θ), E[X], E[X2]

2

0.55 0.1

0.2

0.3

0.4

0.5

0.6

Average overhead fraction, η Fig. 9. Normalized conditional entropy E of the packet length PDF estimated by an adversary with no a priori knowledge on the packet length PDF as a function of g.

the application protocol used mostly is http and she could even get samples of that kind of trafﬁc from the open Internet, so as to learn what its statistical features are. Formally we are assuming here that X ¼ fPDFðXÞ; k; gg. Even if endowed with such a priori knowledge, the adversary cannot learn anything more on the actual user activity in a given period of observation during which the adversary captures the trafﬁc carried on the protected tunnel. Speciﬁcally, the adversary could not tell whether any given captured packet is actually carrying some user information on only dummy bits. Any hypothesis the adversary could make is based completely on her prior knowledge and no further evidence is gained by observing the constant rate trafﬁc carried continuously on the tunnel in both directions. Formally, we have E ¼ 1 in this case. What is being fully protected against such a knowledgeable adversary is the actual activity of the users at any given time. The (a priori) knowledge of the adversary has only to do with global, long-term characteristics of the trafﬁc (e.g., packet length PDF, type of application used and possibly their relative frequency). The adversary cannot tell however, which of the users in networks A and B are actually communication at any given time, if there is any communication at all and what it consists of (e.g., what application is being used at any given time). In the ensuing sections we move from a theoretical model, mainly aimed at basic understanding of the masking device parameter inter-play, to a realistic example, based on network protocol emulation of a full-blown, real application.

4. Framework for masking performance assessment In order to assess the impact of masking techniques, both in terms of packet delay and in terms of overhead, we have set up an emulation based test-bed where masking is applied to a link between the subnetworks N A and N B . To this end, we have used Common Open Research Emulator (CORE) to emulate the behavior of the computer

9

applications and the trafﬁc they generate in the presence or absence of masking techniques. CORE has been chosen since it offers a lightweight representation of computer networks for emulation. It also allows us to easily run real applications that need to communicate on the network. CORE in each host/router is emulated through a virtualization of the Operating System. The virtualization allows replicating only the network stack and the functions strictly necessary for emulation, thus avoiding replication of the entire OS image with considerable saving of memory space. Even though the network is emulated, the exchange of messages between two nodes in the network is a real exchange in the sense that actual protocol stacks are involved as in an operational network with physical nodes. These features make CORE particularly attractive for emulating large scale networks on commodity hardware. For further details about CORE features we refer the reader to [34]. Hosts in the two subnets N A and N B run the application used to test the performance of the masking device. In our experiment setting the masking device is realized on the edge routers interconnecting the subnet N A (N B ) via the intermediate insecure subnetwork N C . All the trafﬁc exchanged between hosts located in the subnets N A and N B is carried through a tunnel between the edge routers and trafﬁc masking is applied to the aggregated trafﬁc of the tunnel. Trafﬁc masking could be realized also at the involved end hosts. Practical implementation depends on the desired scope of the privacy coverage. 4.1. Trade-off between information leakage and performance of trafﬁc masking The masking device consists of a single buffer whose output is shaped to have ﬁxed length packets (packet payload equal to Y 0 ) sent at ﬁxed times (inter-packet gap time equal to U 0 ). Typical network trafﬁc exhibits quite concentrated packet length probability distribution functions, with a few recurrent lengths and the majority of lengths values having very small probability. Most of the packets carried by operational networks are either relatively short (TCP ACKs, DNS messages, signaling messages of applications in general) or consistently long (with typical length close to the Ethernet MTU = 1500 bytes). Having a ﬁxed payload length Y 0 at the output of the masking device incurs in a heavy overhead. Overhead can be reduced by allowing different ﬁxed length values Y j ; j ¼ 1; . . . ; ‘, as shown in Section 3 for ‘ ¼ 2. The masking device is conﬁgured as ‘ FIFO buffers, served by ‘ channels. The j-th channel sends packets with ﬁxed payload Y j and ﬁxed interpacket gap times U j . Incoming packet belonging to the trafﬁc ﬂow that is being masked are split among the ‘ buffers according to their length X. The decision thresholds of the trafﬁc splitting can be optimized to minimize the mean delay through the masking device for a given overhead fraction. A multiple buffer masking device as the one here described deviates from the perfect masking device described in Section 2. In fact, information on the original packet length distribution leaks through the masking device, speciﬁcally the fraction of the input trafﬁc corre-

10

A. Iacovazzi, A. Baiocchi / Computer Networks 77 (2015) 1–17

sponding to each packet length interval can be grossly estimated by looking at the rate of the masking device output channels. The trade-off between leakage and performance can be controlled by choosing ‘. No leakage is allowed for ‘ ¼ 1. The most detailed knowledge of the input packet lengths is leaked if ‘ is set equal to the number of different lengths values of the input trafﬁc ﬂows (at most they are as much as the MTU value, but usually much less, e.g., ten different length values can cover more than 90% of the probability mass of the packet length PDF). In the following we address the case ‘ ¼ 2. Out main point here is to assess the beneﬁt in terms of performance brought about by a minimal leakage on the input packet length information. Essentially the leakage amounts to yielding a gross estimate of the ratio between ‘‘small’’ packets and ‘‘large’’ packets in the input trafﬁc ﬂow. With ‘ ¼ 2 a single threshold h must be chosen. Packets with lengths X < h are directed to the ‘‘small’’ packet buffer, the others to the ‘‘large’’ packet buffer. Output packet payload lengths and inter-packet gap times are ﬁxed at Y 1 ; U 1 for the ‘‘small’’ packet output channel and Y 2 ; U 2 for the ‘‘large’’ packet output channel.

4.2. Emulation testbed setting To obtain a faithful yet simple emulation of the masking device we have considered the layout shown in Fig. 10 for the single buffer masking device and the one in Fig. 11 for the double buffer masking device.

2

1

Network

1

Network

Network 2

Fig. 10. Network topology of the framework emulating the single buffer masking device.

3

1 1

3

Network

Network

Network

1

2 2

2

4

4

Fig. 11. Network topology of the framework emulating the dual buffer masking device.

By observing the topology emulating the single buffer masking device, we can identify three types of nodes: the application hosts belonging to the subnets N A and N B , which run an application generating trafﬁc to one or more hosts of the same type, placed in the same or in the other subnet; the dummy host Hdumi set to generate dummy packet ﬂows which will be mixed with the real trafﬁc; the two routers Renci , specialized in multiplexing/ demultiplexing useful and dummy trafﬁc, to create a single stream of IP packets of the same length L0 ¼ H þ Y 0 (by using fragmentation and padding) and to send the ﬁxed length packet stream with a ﬁxed inter-packet time U 0 . In this topology, all trafﬁc ﬂowing between the two subnets N A and N B passes through an encrypted tunnel between the two routers Renc1 and Renc2 . In the topology with dual buffer we ﬁnd the following nodes: the application hosts; two router Rspliti that connect the subnets N A and N B with the insecure network N C and carry all the trafﬁc going from N A to N B and vice versa; four dummy host Hdumi ; four router Renci acting as endpoints of two encrypted tunnels. Routers Renci are connected to subnet N C via Ethernet interfaces. Renc1 and Renc3 send out packets of ﬁxed length Lsmall ¼ H þ Y 1 , while Renc2 and Renc4 set the ﬁxed length of the packets they send out to Llarge ¼ H þ Y 2 . Routers Rsplit1 and Rsplit2 have the task of routing trafﬁc from one subnet N A or N B to the insecure network N C (and vice versa). The routing tables of these two routers are conﬁgured so as to obtain a routing based on the length of the incoming packet. The routing algorithm is very simple. Let us consider the router Rsplit1 : after ﬁxing a threshold h, all packets arriving from the network N A , undergo a check on the packet length, and if the length is smaller than h, then the packet will be forwarded to the router Renc1 , while if it is greater than or equal to h, it will be forwarded to the router Renc2 . Dummy hosts Hdumi for i ¼ 1; . . . ; 4, have been designed so as to generate a continuous stream of dummy packets. Hosts Hdum1 and Hdum3 send packet of ﬁxed length Lsmall while hosts Hdum2 and Hdum4 set their packet lengths to Llarge . For both topologies, dummy packets are generated by the nodes Hdum at a rate greater than what the connection between edge routers of the subnets N A and N B can work off. The dummy packets are inserted inside the stream of useful trafﬁc in order to ﬁll the vacant time intervals in which there is no useful packet to send. The generation of dummy packets is handled by the open source software Scapy 2.2.0 [35]. The dummy packets are addressed to the dummy host opposed to the one that generates them. As a matter of example, the host Hdum1 sends its packets to the host Hdum3 and vice versa. When dummy hosts receive the pack-

A. Iacovazzi, A. Baiocchi / Computer Networks 77 (2015) 1–17

ets, they simply discard them. All dummy packets are processed by the routers through static routing, so as to ensure that they ﬁll the link capacity for which they were generated. Routers Renci for i ¼ 1; . . . ; 4, aim at creating a continuous stream of packets, all with the same length and equally spaced in time. Let us consider the router Renc1 : it receives as input all packets coming from Rsplit1 with length smaller than h and the continuous ﬂow of dummy packets from the host Hdum1 . These packets will be rerouted towards the eth0 interface in order to reach Renc3 . A double buffer is in correspondence to the eth0 interface, used to assign a different priority to useful and dummy packets. Packets from Rsplit1 will be conveyed to the high priority buffer, while the dummy packets will be routed to the low priority one. When the router needs to send a packet on eth0, it extracts the ﬁrst packet available in the queue with higher priority. If no packet is present in the high priority queue, then the dummy packet buffer, that is always non-empty, will be selected. Packets sent out on eth0, will undergo fragmentation and/or padding to ensure that outgoing packets are all with the same ﬁxed length Lsmall , and then they will be encapsulated in an encrypted channel previously established with the router Renc3 .5 In addition, the router receives packets from Renc3 through the encrypted channel. They are decrypted, padding is removed and then the packets are routed according to their destination IP address. The same work is performed by all other routers; the only difference is that the router in the bottom link of the ﬁgure do not require fragmentation, since the protected tunnel ﬁxed length Llarge is set equal to the eth1 interface MTU, i.e., Llarge ¼ 1500 bytes. This is motivated by the IP packet length probability distribution of the considered application (see Fig. 4): most of the mass is concentrated on relatively small and large packet sizes. This is typical of IP trafﬁc.

5. Case study: masking applied to a secure multiparty computation system The masking technique with single or double buffer, based on dummying, fragmentation and padding, is especially meant to be applied to aggregated trafﬁc carrying a signiﬁcant amount of data over extended span of time. While the model based performance evaluation of Section 3 focuses on packet level metrics, in this section we use the network emulation testbed described in Section 4 to assess performance as seen from the application point of view. The delay is therefore the time taken to complete an application signiﬁcant task. Delay and overhead are evaluated both with and without the trafﬁc masking. We remark that our experiments with a full blown application and networking environment brings about a realistic evaluation of the user perceived performance. The delay introduced by the masking device affects the application dynamics, so that the trafﬁc impacting the masking device is dependent of the masking device itself 5 In our experiment, the SSH protocol has been used in order to establish a secure tunnel, but any other type of secure channel would be possible, e.g., IPSec tunnel.

11

and its chosen parameters. This closed-loop interaction between the application state machine and the masking device cannot be accounted for in trace-driven simulations, where open loop input trafﬁc is offered to the device. Even if traces are collected in real networks, they represent a sample of packet trafﬁc that does not account for the effect of the introduction of the masking device into the data path. To gain a full understanding of real operation, the experiments must be done by running both the applications generating the trafﬁc to be protected and the masking devices simultaneously. In such a context, an application distributed over multiple hosts, that require a continuous cooperation with a sustained transfer of data among them, is well suited. We have chosen a distributed application of secure multiparty computation (SMC), in which several parties are enabled to jointly compute a function over their inputs, while at the same time keeping these inputs private. Given the security requirements of this kind of applications, it makes sense to enhance the secure channels among the hosts with trafﬁc conﬁdentiality protection through masking. We used SEPIA, a java implementation for generic secure multiparty computation developed by the ETH Zurich [36]. SEPIA offers basic primitives, optimized for processing high-volume input data. It is based on the Shamir’s secret sharing scheme and is secure in the honest-but-curious adversary model. Compared to other tools of SMC, SEPIA allows grouping operations in rounds and this allows to get faster execution times for a single operation. For further information about SEPIA and his implementation please refer to [37]. The SMC system has been integrated into our masking mechanism by associating the application players to the hosts of the two subnetworks N A and N B of Fig. 11. In the next subsections we detail the considered application scenarios and how the various players of the distributed platform are divided between subnets N A and N B . Then we present the results of the performance tests obtained by emulating the SMC application, according to the networking scenario described in Section 4. 5.1. The SEPIA SMC architecture Input Peers (IPs) and Privacy Peers (PPs) are the two kind of entities deﬁned in SEPIA. The IPs own private data which will be the input for the computation, while PPs perform the secure and distributed computation on the data. Even if PPs are logically different from IPs, a host can take the role of both PP and IP, without impairing the security level provided by the SMC system. However, in our testbed, IPs and PPs are always considered as separate entities; so a separate virtual machine is instantiated for each IP and each PP. Let N and M denote the number of IPs and PPs, respectively. Based on the Shamir’s secret sharing protocol implemented in SEPIA, the i-th IP generates a random polynomial f ðÞ and computes M shares for its private input ai ; i ¼ 1; . . . ; N. The i-th IP sends the j-th share to the j-th PP, j ¼ 1; . . . ; M. The PPs collaborate in order to compute the function on the shares received and the ﬁnal result will be sent back to all IPs. We do not dwell on the details on

12

A. Iacovazzi, A. Baiocchi / Computer Networks 77 (2015) 1–17

Network

Network /2+ 1

2

Network

−1

Network

1 1

1 2

1

Network

2

Network

/2

/2+ 1

/2

−1

2

(a) Topology 1

Input Peers

Privacy Peers

(b) Topology 2

Fig. 12. The two topology mappings considered in SEPIA performance evaluation experiments.

how the shares are calculated and how the PPs operate to compute the ﬁnal result. We just need to know that, since Shamir’s secret sharing algorithm is linear, in order to calculate the sum of M inputs, it takes only one round of communication among the PPs, while the multiplication requires two rounds of communication. In order to embed the SMC system just described in the network topology of Fig. 11, we considered two mappings. In the ﬁrst one (Fig. 12(a)) the IPs and the PPs are equally divided between the two subnets N A and N B ; in the second topology mapping (Fig. 12(b)), we place all IPs in subnet N A while PPs are placed in N B . In our experiments we have considered N ¼ 8 IPs and M ¼ 6 PPs. Among the several operations of secure multiparty computation implemented in SEPIA, we selected the basic addition on shared secrets which requires the least computational cost, while in order to get a signiﬁcant volume of trafﬁc among nodes, we set the number of operations carried out for each round to S = 100,000. We have performed initial tests without masking and without any bandwidth limitation, in order to understand the behavior of the application in terms of amount of trafﬁc over the network N C of Fig. 12, the time evolution of the overall bandwidth required and the time taken for the execution of a round. In Fig. 13 we can see the cumulative distribution function of packet lengths that are observed across the network N C . Most of the probability mass is concentrated around a couple of length values. Roughly 50% of the packet lengths are around 50 bytes, while the remaining 50% has the maximum length allowed by the network conﬁguration, namely 1500 bytes. This motivates the choice Llarge ¼ 1500 bytes made for the ﬁxed output packet size of the large packets masking device queue. In Fig. 14, the bidirectional bandwidth demand on the edge router connection through the network N C is plotted as a function of time for the two test topologies. We have considered a computational round in which no constraint about the bandwidth offered by the network N C has been imposed. Therefore, the only limit is due to the speed at which the processor is able to transfer data between two virtual routers. Under these conditions, we have observed that a round has a duration of about 160 s.

We have separated the trafﬁc due to packets with lengths less than the threshold h and that due to packets with lengths above the threshold. We have chosen h equal to 829 bytes.6 The large almost ﬂat central part of the plot in Fig. 13 highlights that the obtained performance results of the two buffer masking devices are weakly sensitive to the speciﬁc chosen value of h, provided it is not close to the extremes of the length range. In particular, in Fig. 14(a) and (b) we can compare the temporal evolution of the bandwidth required by small packets for the two topologies, while in Fig. 14(c) and (d) there is the same comparison done on packets with lengths above the threshold h. The evolution of the bandwidth used on the network N C , for the topology 1, highlights that we have a more uniform behavior during the round compared to the second topology, in which one can see a time interval greater than 50 s when no messages are exchanged through the network N C . That time interval corresponds to an application phase of a round when the PPs perform their computation on the shares, so messages are exchanged only among PPs. On the other hand, when there is trafﬁc carried by the network N C , the scenario of the topology 2 results in a greater bandwidth demand than the one of the topology 1. In the next two subsections we evaluate and compare the performance obtained for the two masking systems with single and double buffer described in Section 4.

5.2. Performance for single buffer masking We test the behavior of the framework emulating the single buffer masking device, applied to the distributed system for multiparty computation, by varying two parameters: the packet length L0 H þ Y 0 and the inter-packet time U 0 of the packets outgoing into the encrypted tunnel. We consider four values for the packet length: 100, 500, 1000 and 1500 bytes. To evaluate the cost in terms of overhead and delay due to masking, we measure the average execution times of a 6 Given the ﬁxed sizes of output IP packets, namely Lsmall ¼ 100 bytes and Llarge ¼ 1500 bytes; Z ¼ 829 bytes is the largest length value such that fragmenting an IP packet of length Z to the ﬁxed length Lsmall introduces less overhead than padding it to Llarge .

13

A. Iacovazzi, A. Baiocchi / Computer Networks 77 (2015) 1–17

Cumulative Density Function

1

0.8

0.6

0.4

0.2

0 0

500

1000

1500

Input packet length [bytes] Fig. 13. Cumulative distribution function of the packet lengths observed across the network N C .

0.6

0.6

0.5

0.5

Rate [Mbit/sec]

Rate [Mbit/sec]

round and the average overhead introduced per round. For comparison, the execution time has been computed in the absence of masking as well, and under the same

bandwidth constraints. With respect to the analysis in Section 3, where the average packet delay is evaluated, here we account for the time required to complete an application round. Let T mask be the average execution time of a round in the case of masking, and T nomask be the same time in the case without masking. Similarly, we deﬁne Smask to be the average amount of trafﬁc (in bit/s) carried over network N C tunnels when masking is applied, whereas Snomask is the same quantity with no masking. In Fig. 15 we plot the time ratio T mask =T nomask as a function of the output bit rate C ¼ L0 =U 0 of the routers Renc1=2 . The two graphics in Fig. 15 show that, for both topologies, the time ratio asymptotically tends to 1 as C increases. The trafﬁc ratio gets closer to 1 also the bigger the output packet length L0 is. Higher values of packet length give better performance in term of overhead as well, as one can see in Fig. 16, where the trafﬁc ratio Smask =Snomask is plotted as a function of the output bit rate C. There is clearly a trade-off between delay and overhead: as C grows, the time ratio tends to 1, while the trafﬁc ratio grows. While it appears that time ratio levels not far from 1 can be readily achieved, the trafﬁc ratio never drops

0.4 0.3 0.2 0.1

0.3 0.2 0.1

50

100

0 0

150

100

Time [sec]

(a)

(b)

20

20

15

15

10

5

0 0

50

Time [sec]

Rate [Mbit/sec]

Rate [Mbit/sec]

0 0

0.4

150

10

5

50

100

150

0 0

50

100

Time [sec]

Time [sec]

(c)

(d)

150

Fig. 14. Evolution of the bandwidth usage on the network N C for the topologies 1 (on the left side) and 2 (on the right side). Top plots refer to trafﬁc due to packets with length less than h, bottom plots refer to the rest of the trafﬁc (packet with length greater than or equal to h). The threshold h is set to 829 bytes.

14

A. Iacovazzi, A. Baiocchi / Computer Networks 77 (2015) 1–17

Fig. 15. Performance of masking in case of single queue: duration ratio vs. output bit rate.

below about 2.7, i.e., quite a substantial overhead can be expected with a single buffer masking device. The tradeoff between trafﬁc ratio and time ratio is plotted in Fig. 17 for the two test topologies. 5.3. Performance for double buffer masking Starting from the bandwidth required in the network N C , in order to perform the distributed computing, we set the two packet lengths Llarge ¼ 1500 bytes; Lsmall ¼ 100 bytes, while the output bit rate for small packets C small ¼ 100 kbit=s at the routers Renc1=3 . As for single buffer experiments, we measured the average execution times of a round and the average overhead introduced per round. For comparison, the execution time was computed in the absence of masking as well, and under the same bandwidth constraints. The tests were performed by varying the output bit rate C large of the routers Renc2=4 . In Fig. 18(a) we plot the ratio T mask =T nomask as a function of the output bit rate C large of the routers Renc2=4 .

The graph shows that, for both topologies, the duration ratio asymptotically tends to 1 with the increase of the bandwidth. It is interesting to see how the curve relative to topology 2 is always well above the curve of topology 1, for which the time ratio tends to 1 faster. Fig. 18(b) shows the trend of the ratio Smask =Snomask by varying the output bit rate C large . In this graph we can see that, for both topologies, it is necessary to at least double the volume of trafﬁc injected into the network with respect to the volume produced by the application. The two curves in Fig. 18(b) are concave. They attain the minimum of the average overhead at C large ¼ 200 kbit/s for the topology 1 and at C large ¼ 1200 kbit/s for the topology 2. This concave behavior departs from the prediction of the analytical model. In fact, the trafﬁc offered to the masking device in the analytical model is open loop, i.e., its characteristics do not change as the masking device parameters are varied. With the full blown application of our testbed, messages are generated in response to delivered messages, according to the evolution of the state machine of the

Fig. 16. Performance of masking in case of single queue: trafﬁc ratio vs. output bit rate.

A. Iacovazzi, A. Baiocchi / Computer Networks 77 (2015) 1–17

Fig. 17. Performance of masking in case of single queue: trade-off between trafﬁc ratio and duration ratio.

Fig. 18. Performance of masking in case of double queue.

15

16

A. Iacovazzi, A. Baiocchi / Computer Networks 77 (2015) 1–17

application entities. The delay inserted by the masking device induces further delays in the opposite direction in the data exchange between two peers. The time gaps in the application dialog pattern are ﬁlled with dummy packets by the masking device. As the allowed output capacity of the masking device is increased, sending of messages is sped up and the dead times of the application entities are reduced. Hence, less dummy packets are required and this beneﬁcial effect more than compensates the increased overhead due to the bigger allowed output bandwidth. Fig. 18(c) shows the trade-off between trafﬁc ratio and time ratio. A time ratio level close to 1 can be obtained at the price of very large overhead values. For a time ratio up to 1.2 for the topology 1, and up to 1.8 for the topology 2, the dual buffer masking device leads to a trafﬁc ratio level of 2.4. We get the minimum values of the trafﬁc ratio at 1.3 and 2 respectively. This is not as bad a result, considering that conﬁdentiality protection is essentially complete, the only leaked information being the fraction of the overall input trafﬁc consisting of small packets. By comparing the trade-off of Fig. 18(c) with that of Fig. 17 the improvement entailed by this small leakage can be appreciated.

itself. We emulate the full blown networking scenario and protocol stack by using virtual machines interconnected via physical (when hosted in different physical machines) or logical (when in the same physical machine) network interfaces. Both the single buffer and double buffer masking devices have been implemented. In the second case, the information leakage on the original trafﬁc amounts to the knowledge of the average fraction of ‘‘small’’ packets (whose lengths are below a given threshold). It is shown that trafﬁc masking can be attained for the considered application by accepting a increase of the trafﬁc volume of a factor about 2.4 and an increase of the application-level task completion time of about 30%. Hence, (almost) full conﬁdentiality appears to be more appealing for contexts where delay constraints are more valuable than bandwidth. As a next step, we envisage to implement the double buffer masking device in the OpenSSL suite and to experiment with the prototype implementation so as to understand the realized quality of experience for various applications.

References 6. Conclusion We address the protection of the conﬁdentiality of trafﬁc ﬂows in a packet network against trafﬁc analysis. An effective, packet level approach to conﬁdentiality protection has been illustrated: it is based on resampling packet sizes and inter-packet gap times from probability distributions that are independent of the original packet sizes and inter-packet gap times. As shown in previous works, the optimal masking device for full masking consists in a trafﬁc shaper that outputs ﬁxed size packets at ﬁxed times. Optimality consists in minimizing the average packet delay under a constraint on the output average overhead fraction. Overhead is due to packet fragmentation, padding and dummy packets. All operations entailed by the masking device can be readily performed in standard protocols, e.g., fragmentation and reassembly is natively provided by IP. We analyze the proposed optimal masking device, made up of a single buffer, to ﬁnd the delay-overhead trade-off. We highlight the performance signiﬁcant advantage that can be gained on that trade-off if some information on the original trafﬁc ﬂow is leaked by giving up to full masking. This can be done by using a double masking device, using two buffer, to exploit the typical bimodal probability distribution of packet lengths. The ﬂow to be protected is split into packets having length above a given threshold and those with length no more than the threshold. The two sub-streams are dealt with as in the single buffer masking device, while the overall output bandwidth is split to serve the two buffers. A detailed analysis is provided by using a queuing model, to gain insight into the double buffer masking device. Then, we test our trafﬁc masking approach on a secure multiparty computation distributed application, where conﬁdentiality requirements are key to the application

[1] J.-F. Raymond, Trafﬁc analysis: protocols, attacks, design issues, and open problems, in: Workshop on Design Issues in Anonymity and Unobservability, 2000, pp. 10–29. [2] O. Berthold, H. Federrath, M. Köhntopp, Project anonymity and unobservability in the Internet, in: Proceedings of the Tenth Conference on Computers, Freedom and Privacy: Challenging the Assumptions, ACM, New York, NY, USA, 2000, pp. 57–65. [3] T.T.T. Nguyen, G.J. Armitage, A survey of techniques for internet trafﬁc classiﬁcation using machine learning, IEEE Commun. Surv. Tutorials 10 (1–4) (2008) 56–76. [4] H. Kim, K. Claffy, M. Fomenkov, D. Barman, M. Faloutsos, K. Lee, Internet trafﬁc classiﬁcation demystiﬁed: myths, caveats, and the best practices, in: Proceedings of the 2008 ACM CoNEXT Conference, CoNEXT ’08, ACM, New York, NY, USA, 2008, pp. 11:1–11:12. [5] A.C. Callado, C.A. Kamienski, G. Szabo, B.P. Gero, J. Kelner, S.F.L. Fernandes, D.F.H. Sadok, A survey on internet trafﬁc identiﬁcation, IEEE Commu. Surv. Tutorials 11 (3) (2009) 37–52. [6] M. Mellia, A. Pescapè, L. Salgarelli, Trafﬁc classiﬁcation and its applications to modern networks, Comput. Networks 53 (6) (2009) 759–760. [7] Y.-s. Lim, H.-c. Kim, J. Jeong, C.-k. Kim, T.T. Kwon, Y. Choi, Internet trafﬁc classiﬁcation demystiﬁed: on the sources of the discriminative power, in: Proceedings of the 6th International COnference, Co-NEXT ’10, ACM, New York, NY, USA, 2010, pp. 9:1–9:12. [8] A. Dainotti, A. Pescapè, K.C. Claffy, Issues and future directions in trafﬁc classiﬁcation, IEEE Network 26 (1) (2012) 35–40. [9] B. Miller, L. Huang, A. Joseph, J. Tygar, I Know Why You went to the Clinic: Risks and Realization of Https Trafﬁc Analysis, arXiv preprint arXiv:1403.0297. [10] D. Herrmann, K.-P. Fuchs, H. Federrath, Fingerprinting techniques for target-oriented investigations in network forensics, in: Sicherheit, 2014, pp. 375–390. [11] A. Hintz, Fingerprinting websites using trafﬁc analysis, in: Privacy Enhancing Technologies, 2002, pp. 171–178. [12] Q. Sun, D.R. Simon, Y.-M. Wang, W. Russell, V.N. Padmanabhan, L. Qiu, Statistical identiﬁcation of encrypted web browsing trafﬁc, in: IEEE Symposium on Security and Privacy, 2002, pp. 19–30. [13] G.D. Bissias, M. Liberatore, D. Jensen, B.N. Levine, Privacy vulnerabilities in encrypted HTTP streams, in: Privacy Enhancing Technologies, 2005, pp. 1–11. [14] S. Chen, R. Wang, X.F. Wang, K. Zhang, Side-channel leaks in web applications: a reality today, a challenge tomorrow, in: IEEE Symposium on Security and Privacy, 2010, pp. 191–206. [15] C.V. Wright, L. Ballard, F. Monrose, G.M. Masson, Language identiﬁcation of encrypted voip trafﬁc: Alejandra y roberto or alice and bob? in: USENIX Security, 2007, pp. 1–12.

A. Iacovazzi, A. Baiocchi / Computer Networks 77 (2015) 1–17 [16] A.M. White, A.R. Matthews, K.Z. Snow, F. Monrose, Phonotactic reconstruction of encrypted VoIP conversations: hookt on fon-iks, in: IEEE Symposium on Security and Privacy, 2011, pp. 3–18. [17] A. Das, P. Pathak, C.-N. Chuah, P. Mohapatra, Contextual localization through network trafﬁc analysis, in: Proc. of IEEE INFOCOM 2014, 2014, pp. 925–933. [18] A. Back, U. Möller, A. Stiglic, Trafﬁc analysis attacks and trade-offs in anonymity providing systems, in: Information Hiding, 2001, pp. 245–257. [19] K.P. Dyer, S.E. Coull, T. Ristenpart, T. Shrimpton, Peek-a-boo, i still see you: why efﬁcient trafﬁc analysis countermeasures fail, in: IEEE Symposium on Security and Privacy, 2012, pp. 332–346. [20] C.V. Wright, S.E. Coull, F. Monrose, Trafﬁc morphing: an efﬁcient defense against statistical trafﬁc analysis, in: Proceedings of NDSS, 2009, pp. 1–14. [21] A. Iacovazzi, A. Baiocchi, Optimum packet length masking, in: International Teletrafﬁc Congress, 2010, pp. 1–8. [22] A. Iacovazzi, A. Baiocchi, Internet trafﬁc privacy enhancement with masking: optimization and tradeoffs, IEEE Trans. Parallel Distrib. Syst. 25 (2) (2014) 353–362. [23] M.J. Freedman, R. Morris, Tarzan: a peer-to-peer anonymizing network layer, in: Proceedings of the 9th ACM Conference on Computer and Communications Security, CCS ’02, ACM, New York, NY, USA, 2002, pp. 193–206. [24] D.I. Wolinsky, H. Corrigan-Gibbs, B. Ford, A. Johnson, Dissent in numbers: making strong anonymity scale, in: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI’12, USENIX Association, Berkeley, CA, USA, 2012, pp. 179–192. [25] S. Le Blond, D. Choffnes, W. Zhou, P. Druschel, H. Ballani, P. Francis, Towards efﬁcient trafﬁc-analysis resistant anonymity networks, in: Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM, SIGCOMM ’13, ACM, New York, NY, USA, 2013, pp. 303–314. [26] S. Le Blond, D. Choffnes, W. Zhou, P. Druschel, H. Ballani, P. Francis, Towards efﬁcient trafﬁc-analysis resistant anonymity networks, SIGCOMM Comput. Commun. Rev. 43 (4) (2013) 303–314. [27] S. Yu, T. Thapngam, H.I. Tse, J. Wang, Anonymous web browsing through predicted pages, in: IEEE Globecom Workshops, 2010, pp. 1581–1585. [28] S. Yu, G. Zhao, W. Dou, S. James, Predicted packet padding for anonymous web browsing against trafﬁc analysis attacks, IEEE Trans. Inform. Forensics Sec. 7 (4) (2012) 1381–1393. [29] X. Luo, P. Zhou, E.W.W. Chan, W. Lee, R.K.C. Chang, R. Perdisci, HTTPOS: sealing information leaks with browser-side obfuscation of encrypted ﬂows, in: Proceedings of NDSS, 2011, pp. 1–20. [30] F. Zhang, W. He, X. Liu, Defending against trafﬁc analysis in wireless networks through trafﬁc reshaping, in: ICDCS, 2011, pp. 593–602. [31] P. Venkitasubramaniam, T. He, L. Tong, S.B. Wicker, Toward an analytical approach to anonymous wireless networking, IEEE Commun. Mag. 46 (2) (2008) 140–146. [32] A. Iacovazzi, A. Baiocchi, Investigating the trade-off between overhead and delay for full packet trafﬁc privacy, in: IEEE International Conference on Communications (ICC), 2013, pp. 1345–1350.

17

[33] D. Gross, J.F. Shortle, J.M. Thompson, C.M. Harris, Fundamentals of Queueing Theory, fourth ed., Wiley-Interscience, New York, NY, USA, 2008. [34] J. Ahrenholz, Comparison of CORE network emulation platforms, in: Military Communications Conference, 2010 – MILCOM 2010, 2010, pp. 166–171. [35] P. Biondi, Scapy. . [36] M. Burkhart, M. Strasser, D. Many, X.A. Dimitropoulos, Sepia: privacy-preserving aggregation of multi-domain network events and statistics, in: USENIX Security Symposium, 2010, pp. 223–240. [37] M. Burkhart, Sepia. .

Alfonso Iacovazzi received his MSc Degree in Telecommunication Engineering from Sapienza University of Rome, Italy, in 2008, and his PhD degree in Information and Communications Engineering from the same University, in 2013. Since March 2013 he is a Postdoctoral Research Fellow at DIET Dept, Rome, Italy. He is part of the Networking Group. His main research interests are about communications security and privacy, trafﬁc analysis and monitoring, trafﬁc anonymization, cryptography (mathematical aspects and applications).

Andrea Baiocchi received his Laurea degree in Electronics Engineering in 1987 and his PhD degree in Information and Communications Engineering in 1992, both from the University of Roma ‘‘La Sapienza’’. Since January 2005 he is a Full Professor in the Department of Information Engineering, Electronics and Telecommunications of the University of Roma ‘‘Sapienza’’. The main scientiﬁc contributions of Andrea Baiocchi are on trafﬁc modeling and trafﬁc control in ATM and TCP/IP networks, queueing theory, radio resource management, MAC protocols, trafﬁc analysis for ﬂow classiﬁcation. His current research interests focus on wireless access protocols, speciﬁcally for VANET applications, and on protection of trafﬁc privacy against classiﬁcation adversaries. Andrea’s research activities have been carried out also in the framework of many national (CNR, MIUR) and international (European Union, ESA) projects, also taking coordination and responsibility roles. Andrea has published more than a hundred papers on international journals and conference proceedings, he has participated in the Technical Program Committees of more than forty international conferences; he also served in the editorial board of the telecommunications technical journal published by Telecom Italia for ten years.

Protecting traffic privacy for massive aggregated traffic

Protecting traffic privacy for massive aggregated traffic

Recommend Documents