TimeStream: Exploiting video streams for clock synchronization

TimeStream: Exploiting video streams for clock synchronization

Ad Hoc Networks 91 (2019) 101878 Contents lists available at ScienceDirect Ad Hoc Networks journal homepage: www.elsevier.com/locate/adhoc TimeStre...

3MB Sizes 0 Downloads 87 Views

Ad Hoc Networks 91 (2019) 101878

Contents lists available at ScienceDirect

Ad Hoc Networks journal homepage: www.elsevier.com/locate/adhoc

TimeStream: Exploiting video streams for clock synchronization Ilir Capuni∗, Nertil Zhuri, Rejvi Dardha University of New York Tirana, Kodra e Diellit, Tirana, Albania

a r t i c l e

i n f o

Article history: Received 1 December 2018 Revised 25 March 2019 Accepted 30 April 2019 Available online 3 May 2019 Keywords: NTP Clock synchronization Video streams HLS

a b s t r a c t Network Time Protocol (NTP) is the most commonly used protocol for clock synchronization on the Internet. In the wake of frequent news about attacks that use Network Time Protocol (NTP), we propose a TimeStream algorithm, which – in its plain version (without any extra settings) – successfully synchronizes computer clocks up to seconds, by processing data found in video streams that use HTTP Live Streaming (also known as HLS) protocol. Further increase of precision is possible with a proper setup of the origin video servers. TimeStream provides secure time synchronization without requiring new ports, new hardware, or expensive cryptography, and uses information which – currently – is already making at least 70% of all Internet traffic. We observe that some intrinsic properties of video streaming over HTTP and congestion control of TCP yield some useful properties that facilitate good performance of the proposed method even in adverse network conditions. These make TimeStream to be robust against packet delay manipulation attack in which, an attacker adds specifically crafted delays to packets carrying time information. To the best of our knowledge, TimeStream may be one of the first methods of its kind that uses structure and core features of video streams and underlying network transportation protocol, to extract information about time and opens a new line of research on almost real-time extraction of useful information from video content. © 2019 Elsevier B.V. All rights reserved.

1. Introduction Clock synchronization of devices is a paramount concern in distributed systems, as many operations are time sensitive and require that time keeping devices in these independent entities be synchronized. In order for a clock to synchronize its time’s accuracy and precision, a time reference that has a higher precision and accuracy must be consulted. These time references vary on their time measurement mechanisms from the mechanical town clock towers to the time servers which calculate the time using frequency of special crystals such as quartz or caesium. Since it is infeasible to have all computers having these pulsating components that would guarantee that the clock does not drift, computer clocks must constantly be adjusted to make sure that they show accurate time.



Corresponding author. E-mail addresses: [email protected] (I. Capuni), [email protected] (N. Zhuri), [email protected] (R. Dardha). https://doi.org/10.1016/j.adhoc.2019.101878 1570-8705/© 2019 Elsevier B.V. All rights reserved.

Traditionally, the time of a computer system is synchronized using the Network Time Protocol (NTP) where a machine, in order to synchronize to a reference clock contacts a time server over the network. According to NTP specifications (see [1]), a client after receiving the time from a time server, computes the network delay and performs some filtering and statistical analysis in order to compute an offset that will be applied to the system gradually, that is, clock adjustment is performed in slow increments because slow steps do not cause issues with software timers or introduce strange gaps in log files etc. 1.1. Security vulnerabilities and clock synchronization The non-broadcast version of NTP and its derivatives are based on the following paradigm: a client sends a request and a server sends a reply. All the communication is carried over UDP. This allows a server to send disproportionate response in size. Many network cyber attacks exploit NTP ports or some recently found vulnerabilities of its design. Malhotra et al. [2] show how NTP can be deactivated on most of the computers on the Internet using only a single attacking machine that performs a Denial of

2

I. Capuni, N. Zhuri and R. Dardha / Ad Hoc Networks 91 (2019) 101878

Service attack (aka DoS) or hijacking NTP traffic through network routing to change a client’s time also known as a timeshift. Other attacks can be made by changing a computer’s clock by spoofed NTP messages and then allowing numerous attacks based on bypassing cryptographic key expiration due to changed time (see [3]). DoS attacks have been performed by using only a small query to an NTP server, with the return address filled with the address of the targeted server [4]. Clearly, by distributing attacking tasks on multiple machines, we get a plain version of a Distributed Denial of Service attack (DDoS). Furthermore, plain vanilla version of NTP supports a monitoring service that enables administrators to query an NTP server for a traffic count. This command – called “monlist” – issues the requester a list of the last 600 hosts that connected to the server. This disproportionate response can now be used to devise a simple amplification attack: An attacker can repeatedly issue a “get monlist” command to an NTP server while spoofing the requester’s IP address to the victim’s IP address. The server responds by sending this large list to the victim’s machine. Network time attacks are common in desynchronizing machines and making them unable to communicate with each other such as in the case of time synchronization attacks on sensor networks explained in [5], or a series of successful attacks on the NTP protocol to deny the time service or shifting the time of machines easily given in Malhotra et al. [2]. A malicious man-in-the-middle can intercept NTP and Precision Time Protocol (PTP) packets and add some carefully computed delay, ultimately causing clock manipulation. Such attacks, known as packet delay manipulation is a security concern in time synchronization over packet switched networks as stated in [6]. Recent publications like [7,8] show that mere use of cryptographic tools does not suffice for the purpose because most of the public key constructs needed for this purpose are time dependent themselves. Leaving the network port 123 open — that is used by NTP — does present a system vulnerability. For this reason many network administrators prefer to keep the NTP port closed, hence leaving the servers or complete networks use inaccurate time. 1.2. Video stream approach In this paper, we take a novel approach. We explore the possibility of getting the accurate time from a widely available pool

of data that contains information about time without explicitly requesting it for clock synchronization purpose. Video streams do have explicit ordering of items and are pushed actively by the source. Even though the multimedia streaming is still in its infancy, in [9], Cisco Visual Network Index forecasts that the Internet video streaming and downloading in 2014–2019, will exceed the rate of 80% of all Internet traffic. In this paper we demonstrate a method, which, out of the box, allows a client machine, which is acquiring a video, or any intermediate node close to the edges of the network that is carrying this traffic, to synchronize its clocks up to seconds by using information that is obtained by sniffing information from HTTP video streaming sessions. This – somewhat rigid – bound on precision of the plain vanilla system comes from the fact that video streams by default carry timestamps that do not go beyond seconds. This can be changed by an appropriate setup on the server side, which will result in a tremendous increase of precision close to NTP. Streaming platforms such as Apple’s HTTP Live Streaming (HLS) and Dynamic Adaptive Streaming over HTTP, also known as MPEGDASH, use HTTP streaming as their underlying delivery method and hence TCP as their transportation protocol. This way of multimedia delivery – known as adaptive bitrate streaming – works by detecting network and device conditions in real-time and adjusting the quality of a video stream accordingly. These protocols split the content into a sequence of small HTTPbased file segments, each segment containing a short interval of playback time of a long video. Alternative segments encoded at different bit rates covering aligned short intervals of play back time are made available to the client, who can automatically select from the alternatives the next segment to download and play back based on current network conditions (see Fig. 1). The client selects the segment with the highest bit rate possible that can be downloaded in time: a client can seamlessly adapt to changing network conditions, and provide high quality play back with fewer stalls or re-buffering events. To achieve this, HTTP video streaming traffic is rich in information about time. Time synchronization has a natural synergy with adaptive bitrate streaming for one of the main goals of the latter is timely deliver of the video content. As we will show, TimeStream is naturally robust against packet delay manipulation attack.

Fig. 1. HLS paradigm: a video streamer prepares a video chopped in segments of different quality which are then distributed to the clients by a web server.

I. Capuni, N. Zhuri and R. Dardha / Ad Hoc Networks 91 (2019) 101878

3

1.3. Related work Security vulnerabilities that we discussed above have induced certain fixes and improvements of NTP protocol or alternatives to it like TLSDate, specified in [10]. This is a more secure alternative to NTP for synchronizing time on local systems developed by Jacob Applebaum of the TOR Project. TLSDate instead of using time servers and specific protocol as it is used by NTP, it simply extracts the time from the trusted SSL certificates and servers that are remotely connected with TLS. Whenever a client is connected through a secure handshake using TLS, the TLSDate will extract the remote time of the server and calculate the local time of the system thus updating the time securely. In [11] authors present a modular, composable formalism of the security requirements from network-time protocols. A recent take on NTP’s security vulnerabilities is the Chronos client proposed in [12]. A Chronos-client periodically queries small subsets of large pool of NTP-servers to obtain time information. It then applies their proposed algorithm to remove outliers and average over the remaining responses. In [13], author uses network computer games’ stream for time synchronization. The stream is used as communication medium to send requests and receive responses. In [14], authors develop a method to recover clocks in MPEG decoders which require at least 10 timestamps per second suitable for DVB standard land transmission. At the first glance it may look like there is some similarity of our approach with the one in the cited work. However, our approach significantly differs since our algorithm has to deal with the delays and bursty traffic that are typical for packet switching networks and are not present in unidirectional DVB terrestrial transmissions. 1.4. Our contribution In this paper we propose an algorithm that sniffs the HLS traffic to and from a server that has the reference time and computes the offsets that need to be applied to the machine in order to have the clock adjusted to the reference clock. The algorithm is passive, that is, it does not take an active role on the communication between the video player and the web server providing it. Similar to NTP, our algorithm filters the video traffic for appropriate samples that are then filtered to extract time and clock frequency estimates. Plain implementation of our algorithm in which appropriate setup of the servers or streaming engine is not allowed, achieves accurate time in seconds, since the DATE field of HLS segments, does not allow larger resolution. If the web server injects exact UTC time in HTTP packets of the segments, then TimeStream’s accuracy is at least 0.5 s. This accuracy can further be increased by running Marzullo’s algorithm (see [15]) on three different video streams coming from different servers, as we will elaborate in more detail in Section 3.3.

Fig. 2. Frequency and time offsets.

2.1. Plain NTP time synchronization In order to synchronize its time, a client will usually poll three or more servers on different networks for the reference time. The time servers upon request, will respond with their timestamps. The protocol must calculate the time offset between the client’s and the server’s clocks as well as the round trip delay of the network communication in order to synchronize the time. In a sequence of such requests, a client obtains a sequence of n 4-tuples

(C11 , t21 , t31 , C41 ), (C12 , t22 , t32 , C42 ), . . . , (C1n , t2n , t3n , C4n ), where C11 and C41 are the timestamps of the moments when the client has issued the first request and received the first response, respectively, t21 and t31 are the timestamps when the server has received the first requests and issued the first response, respectively, and so forth. The obtained sequence is then statistically analyzed. Let us assume first that the frequency offset is estimated and that the client clock’s frequency has been adjusted appropriately. In the ith trial, the time offset is calculated using the following equation:

xi =

(C4i − t3i ) − (t2i − C1i ) 2

.

(1)

The offset and the round trip delay from different sources are statistically analyzed and the best three are selected to synchronize the time. Using the offset, the computer clock frequency is adjusted to fix the offset difference. NTP estimates admit a systematic bias since the travel time from the client to the server and the travel time from the server to the client are usually different. For this reason, the error performed by this method is within half of round trip time.

2. Background 2.2. Adaptive Bitrate Streaming – HLS Let us first spell out some definitions and notation about clock and time synchronization and some basics facts about HLS. Let τ (t ) = t denote the true time and let C(t) be the value that the clock displays at time t. The difference between these two values C (t ) − τ (t ) is called the time offset. Difference of frequencies of the clock and the true time C  (t ) − τ  (t ) = C  (t ) − 1 is called the skew, denoted as φ − 1 (see Fig 2). A long term change of a clock is called a drift. We call the clock C(t) synchronized with a time source τ (t ) = t when frequency offset φ and time offset θ are estimated and then applied.

In HTTP based streaming, video is spliced into a sequence of segment files by fixed time length which are placed on a Web server for delivery. To play a video, a client firsts sends a request for a manifest file using HTTP GET method. This file contains a list of URLs for available segment files, which are acquired in sequence referring to this list. Once the playback starts, a client constantly requests next segment files and keeps the buffer filled. This file also bears information on the variants of the chunks to be downloaded based on the bandwidth, as shown in the following excerpt of the manifest.

4

I. Capuni, N. Zhuri and R. Dardha / Ad Hoc Networks 91 (2019) 101878

Fig. 3. Polling a server for timestamps in round i and i + 1, where Cki = C (tki ).

# variants #EXT-X-STREAM-INF:BANDWIDTH=408000,CODECS= ’’mp4a.40.2,avc1.42C00C’’,RESOLUTION=320x180, AUDIO=’’audio-64’’ Tch-audio_track=64000-video=320000.m3u8 #EXT-X-STREAM-INF:BANDWIDTH=611000,CODECS= ’’mp4a.40.2,avc1.42C015’’,RESOLUTION=483x272, AUDIO=’’audio-64’’ Tch-audio_track=64000-video=512000.m3u8 #EXT-X-STREAM-INF:BANDWIDTH=810000,CODECS= ’’mp4a.40.2,avc1.42C01E’’,RESOLUTION=640x360, AUDIO=’’audio-64’’ Tch-audio_track=64000-video=700000.m3u8 A segment is easy to be identified as it starts with a specific frame which is tagged with a specific mark and is called the Iframe.1 This leading frame will carry the time information saved in the DATE field. Parameter segmentDuration denotes the length of a segment and its typical value is one of the following: 1, 2, 4, 6, 10 s. The encoder produces an intra-coded frame (I-Frame) every fps · segmentDuration frames, where fps denotes the number of video frames per second. A typical value for fps is 25. 3. TimeStream algorithm Each HLS segment sent by the server starts with an I-frame which sets the DATE field of the HTTP response carrying it to the server’s time in the format hh : mm : ss.2 Communication between a video player and the server is done using the following conventions and messages. Timestamp options set: for the stream packet synchronization and bitrate analysis, the Adaptive Bitrate Streaming solutions enable the timestamp option on the TCP headers. Every packet of a stream must have timestamps enabled. HTTP Request GET media: The HTTP requests for the stream from the client must include the GET media. HTTP GET requests for the -audio and the -video. HTTP Response content-type: The HTTP response from the server must have a specific content-type option for the stream. According to HLS specification, the content-type is set to application/vnd.apple.mpegURL or application/xmpegURL.

Recall the notation from Section 2.1 and how an NTP client acquires n quadruples bearing appropriate timestamps. Graphically, typical such quadruples form shapes that are close to trapezes and for this reason, sometimes, we will refer to them as trapezes (see Fig. 3). In contrast to a NTP client, our implementation monitors the traffic on the client side, and classifies the incoming and outgoing traffic to identify video streams acquired by the client. Once filtering for the streams is established, the algorithm performs on the fly processing of the information embedded in the streams. Since the clocks of the client and the server are not synchronized and may have different frequencies, inaccuracies are introduced in the measurement as depicted in Fig. 4. Moon et al. [16] propose an efficient algorithm to estimate the clock skew in network delay measurements. We apply this technique on data collected from a video stream. We show how to estimate the offset too. Recall from Section 2.1 that t3i denotes the time when the ith packet bearing that timestamp value leaves the server and C4i is the value when t3i reaches the client. Timestamps C1i and C4i are stated according to the client’s clock that need to be synchronized, whereas the t2i and t3i are the timestamps according to the “true” clock of the server. Let li denote the timestamp – according to the client’s clock – when of the ith packet leaving the sender. Let d2i be end-to-end delay of the ith packet, that is, let

d2i = C4i − t3i . Further, let t˜si be time duration between the first and the ith packet departures as determined by the clock at the server:

t˜s1 = 0 t˜si = t3i − t31 . Let t˜ri denote the duration of the first and the ith packet arrival times at the client:

t˜r1 = 0 t˜ri = C4i − C41 . From here, we have

li − l1 = φ t˜si . We now define the end-to-end delay on the inbound path 1

d2 = C41 − l1 i

1

This is not strictly enforced, that is, a video player can do fine even without having the first frame tagged. However, in that case, moving forward and backwards within the video is not possible. For this reason, since this kind of video navigation is important for user experience, we do assume that the head frame is tagged by default settings of HLS. 2 It may sound elusive to perform clock synchronization aiming only at the seconds. However, as we will see below, it is possible to enforce the web server to inject its NTP 64 bit long timestamps in the headers of HTTP responses.

d2 = C4i − li . Using (2) and previous definitions we have i

d2 = C4i − li = C4i − li + l1 − l1 = C4i − l1 − (li − l1 ) = C4i − l1 − φ t˜si .

(2)

I. Capuni, N. Zhuri and R. Dardha / Ad Hoc Networks 91 (2019) 101878

5

Fig. 4. Timing chart demonstrating the delays, intervals and measurements.

Since the client does not know l1 , from the computation per-

3.1. Solving the linear program

i

spective, d2 is useless. Let d˜2i denote the delay calculated from C4i and t3i . We have

Using the fact that our data are sorted and that we are dealing with a linear program in two dimensions, we use a modified ellipsoid algorithm (see [17]) as presented in [16] which runs in O(n) time. Later we will see the reasons, but for now we will just state that we set the value of n = 10.

d˜2i = t˜ri − t˜si 1

= t˜ri + d2 − φ t˜s1 + (φ − 1 )t˜si − d21 i

1

= d2 + (φ − 1 )t˜si − d2 . From here, using

d2i



d˜2i

=

C41

d2i

=

C4i

− t3i

(3) we have

− t31 . i

Furthermore, if given φ and d2 , then from (3), we get the end-toend delay as shown on the client’s clock: i

1

d2 = d˜2i − (φ − 1 )t˜si + d2 . 1

Let α and β be the estimate of φ and d2 respectively. Now, the delay dˆ2i with the skew removed is

dˆ2i = d˜2i − (α − 1 )t˜si + β , which yields the current estimate of the offset be equal to

d˜2i − (α − 1 )t˜si .

(5)

which gives us the constraints of our linear optimization problem in which the objective function minimizes the distance between the line and all the delay measurements:

minimize

n 

Similarly, the problem can be stated for the outbound direction of the traffic, that is, when timestamps C1i and t2i are considered.3 Then, the estimated skew is half the sum of the obtained solution of both optimization problems.

3

Process 1 TimeStream. 1: sample_counter ← 0 2: Empty samples array 3: On timestamp arrival do Add the tupple to the samples array 4: 5: sample_counter + + if sample_counter = 10 then 6: solve the linear program 7: 8: goto 1

(d˜2i − αtsi + β ).

i=1

This is done solely for compatibility with NTP. Per specifications, t3i = t2i +

segment_size.

Once we have explained in detail all the components, let us explain the entire technique. TimeStream collects time information from active video streams by first identifying them on the network and choosing one particular video stream if many. The collection is done in time intervals whose length ranges from 10 s to 120 s. When a packet bearing t3i timestamp arrives, we record the time C4i and make a tupple out of it. The goal is to collect n tuples (recall that n = 10). We store these tuples in an array which is emptied at the beginning of each round. Let sample_counter denote the number of tuples received in the current interval. Once sample_counter reaches n, we solve the optimization problem and once that finishes, we restart for the next interval. The complete explanation of the process is given in pseudocode below.

(4)

For all delay measurements, it is natural to assume that dˆ2i be non-negative and hence we have

d˜2i − (α − 1 )t˜si + β ≥ 0, 1 ≤ i ≤ n,

3.2. Putting it all together

3.3. Precision amplification using majority Same as with NTP, the Marzullo’s Algorithm can be used from the time intervals obtained from the data of multiple streams, whereas a time interval is an interval [a, b], where a and b are two timestamps such that a ≤ b. If the majority of active video streams carry accurate time information, the local system will have accurate time since the algorithm takes into consideration the majority of the clock and frequency correction of intersected intervals. Marzullo’s Algorithm finds the best interval using the

6

I. Capuni, N. Zhuri and R. Dardha / Ad Hoc Networks 91 (2019) 101878

Fig. 5. HLS topology.

intersection technique from a series of time intervals (see [15]) which we demonstrate in the following example. Example. Let us consider the following time intervals: 10 ± 2, 12 ± 1 and 11 ± 1; the intersection of these intervals will give the best time for the system to synchronize. The intervals can be written as ranges and the intersection is calculated from the ranges: 10 ± 2 : this interval can be expressed as time from 8 to 12 [8, 12]. 12 ± 1 : this interval can be expressed as time from 11 to 13 [11, 13]. 11 ± 1 : this interval can be expressed as time from 10 to 12 [10, 12]. Result: The best time according to these intervals is [11, 12] which can be expressed as 11.5 ± 0.5. This interval is obtained from the intersection of all the intervals. If all the time intervals do not intersect with each other, the algorithm takes the intersection with the largest amount of sources as the best time to synchronize the system time. 3.4. Note on security Adaptive bitrate streams are delivered from existing types of HTTP web servers, and the streaming is done in HTTP (port 80 or 443). Using packets from these streams presents no danger for attacks on time synchronization. TCP spoofing – a security issue related to TCP – may succeed only if the attacker succeeds to enforce video play and embed wrong information about time in the stream. The added delay can be constant, jittered, or slowly wandering. The jittered delay injection attack is impossible because of the optimization problem is poised to eliminate such distortions. Performing a man-in-the-middle attack and not introducing jitter and video delays (which in turn would incur user’s reaction’s) requires large buffers and solid computational power on the attacker’s side. Various filtering and curve fitting techniques can be introduced on client’s side to mitigate such attacks. However, Marzullo’s amplification does decrease the success rate of injecting

any wrong time information in the streams and appears to be a universal fix of such problems. 4. Performance analysis and evaluations Since computation of the time and frequency correction of our algorithm is reduced to an optimization problem – which does have an efficient algorithm that yields the optimal result – the main concern is to demonstrate if the source of information for our algorithm is reliable in a sense that we have a rich enough samples carrying correct and useful information about time. Furthermore, besides precision, we are concerned with the issue on how do network conditions affect this availability of stream chunks carrying information about exact time to our algorithm. We will later discuss the issue why plain HTTP traffic does not carry enough information for the purpose. Two evaluations are conducted. The first one is actually a fully fledged implementation serving as a proof of concept and the other one is a simulation using NS-3 simulator. 4.1. The testbeds The first testbed consists of a FFMPEG encoder running on a HP Z-400 that streams to the webserver running Apache on a machine of the same kind as the previous, where both belong to the same network. The institutional Internet link is a symmetric 20M bps. Clients are on residential networks having at least 5 Mbps download throughput and at most 1 Mbps uplink throughput. Webserver’s HTTP responses and requests are customized to embed UTC timestamp in responses by using module mod_headers.4 To verify accuracy, clients and the web server synchronize their time using NTP concurrently. As depicted in Figs. 5 and 6, the second testbed consists on a GNS3 emulated network environment containing the following components: 4

See http://httpd.apache.org/docs/current/mod/mod_headers.html.

I. Capuni, N. Zhuri and R. Dardha / Ad Hoc Networks 91 (2019) 101878

7

Fig. 6. The same topology used to test NTP polling.

1. 2. 3. 4.

Cloud (NAT) GNS3 device R1 Emulated Cisco 7200 Router R2 & R3 Emulated Cisco 3725 Routers Five Ubuntu 16.04 64-bit Virtual Machines running on VMWare Workstation 14 Pro-integrated in the GNS3 environment. One VM serving as the HLS Streaming Server. Four VMs serving as Clients accessing and playing the HLS Stream via HTTP. Link characteristics are as follows.

• • • • • • •

R1 R1 R1 R1 R2 R2 R3

G10 to Server; Bandwidth: 300 Mbps; Delay: 5 ms F00 to R2 F0/0 ; Bandwidth: 100 Mbps; Delay: 10 ms F01 to R3 F0/0 ; Bandwidth: 50 Mbps; Delay: 8 ms G20 to E0 Client3 ; Bandwidth: 8 Mbps ; Delay: 10 ms F01 to E0 Client1; Bandwidth: 20 Mbps ; Delay: 15 ms F10 to E0 Client2 ; Bandwidth: 15 Mbps ; Delay: 5 ms F01 to E0 Client4 ; Bandwidth: 12 Mbps ; Delay: 14 ms

R1 is the emulated Cisco 7200 router, whereas R2 and R3 are Cisco 3725 routers. In order to provide media streaming to the clients, the NGINX Web Server is installed in the Server VM. Before actually installing the NGINX Web Server, its compilation with the RTMP (Real-Time Messaging Protocol) module is required to enable streaming to clients (an application in NGINX means an RTMP endpoint). Similarly to the Apache settings, here the same option on UTC time injection is applied. The video segment size is 4 s with the correspondence of bandwidth and resolution as follows. 1. For bandwidth of 680 0 0 bps, send audio-64 only. 2. For bandwidth of 4080 0 0 bps, send video with the resolution 320 × 180, audio-64. 3. For bandwidth of 6110 0 0 bps, send video with the resolution 483 × 272, audio-64. 4. For bandwidth of 810 0 0 0 bps, send video with the resolution 640 × 360, audio-64. We record the traffic for 60 min in total.

4.2. Precision In the first setting, we first check the time obtained from the samples in normal network conditions. Let us analyze Fig. 7a and b. Moments when TimeStream completes computation of the time and the value of the time are marked with a dot. As depicted in Fig. 7a, TimeStream succeeds continuously to obtain the estimates, whereas the time obtained is at most 0.025 s apart from the straight line f (t ) = t. 4.2.1. An (un)expected reaction to congestion Let us consider now the case when the network is heavily loaded at a constant rate. The moments when TimeStream succeeds are spaced out. In particular we point out the large gaps shown in Fig. 7b. Due to heavy traffic, long delays occur. Consequently, the video player on the client side requests lower resolution chunks, but the utilization of the internet link increases. We have observed that at the moment when the available bandwidth is close to the lowest bandwidth declared in the manifest file, an interesting phenomenon occurs. During this event, the client – having analyzed the network conditions – acquires the chunks with the lowest bitrate and resolution. Available bandwidth for the stream will be reduced (by congestion) starting at 22:55:30 (see Fig. 7b to see obtained estimates). At 22:59:00, the available bandwidth reaches the lowest bandwidth on the manifest file and the client receives only audio. This brings needed timestamps to the client which continues computing the estimates of the skew and of the drift. Due to delays and sparser chunks of time that reach the client, the precision suffers: after the big gap, time obtained from TimeStream is within 0.5 s apart from f (t ) = t. Hence, variation of the time obtained by our algorithm is within 0.025–0.5 s. This event is synergetic with the TCP congestion control mechanism actions, in which TCP reduces the sending rate once it becomes aware that congestion may have occurred.

8

I. Capuni, N. Zhuri and R. Dardha / Ad Hoc Networks 91 (2019) 101878

Fig. 7. Diagrams showing time moments when the estimates are obtained and the values of the estimates.

Fig. 8. Numbers of packets carrying timestamps that reach client 1 in various network conditions.

4.3. Availability of chunks In this section we want to analyze the availability of packets carrying information about time. For each client, we count the number of leading frames that reach the client per minute. The webserver issues one such frame every 4 s. Polling interval of NTP is variable with the so called watchdog timer counting the time (in seconds) since the last update. In order to compare NTP with TimeStream, we artificially issue 32 NTP poll requests per minute that are uniformly spread and count the responses. In order to achieve poll interval manipulation on the client side towards the time servers, the virtualized clients will no longer use the default timedatectl/timesyncd to synchronize time. We perform these measurements in normal network conditions and in congested network scenario where extra delays are introduced. As shown in Figs. 8–11, HLS begins by sending the manifest and multiple packets that are buffered by the client before playing of the video begins. This is the reason why HLS diagrams begin with a high number of packets carrying time information. On congested networks, with this respect, TimeStream is superior to the NTP, for the reasons that congestion control and HLS’s adaptive bitrate manage to keep the flow steady. Another reason

why NTP may suffer from congested networks is that network devices routinely “sacrifice” UDP packets in favor of packets being transported by TCP. As depicted in Figs. 8–11, TimeStream obtains a steady stream of information about time regardless of network conditions at 14 tupples per minute on average that are at least 4.5 s apart on average, as we will see below. Even in heavy loaded network setting, the average number of frames carrying time does not dip below 10 frames per minute, as guaranteed by the design of HLS and setup parameters above and demonstrated in the experiments. NTP yields the following results: out of 32 requests per minute that are sent to the NTP server by the client, in normal network conditions all four clients received 10 responses per minute on average. In heavily loaded network, our clients received approximately 8 responses. In Figs. 12–15, we plot the distances between consecutive C4i in both settings for each client. Concern that we would like to address with these measurements is sporadicity of chunks with time information. Distances between arrival times in the plots show that the values of this difference adhere to a constant that is close to 4 s. Recall that the segment size of a video on the origin is 4 s and that each segment begins with the leading frame that carries the needed timestamp.

I. Capuni, N. Zhuri and R. Dardha / Ad Hoc Networks 91 (2019) 101878

Fig. 9. Numbers of packets carrying timestamps that reach client 2 in various network conditions.

Fig. 10. Numbers of packets carrying timestamps that reach client 3 in various network conditions.

Fig. 11. Numbers of packets carrying timestamps that reach client 4 in various network conditions.

Fig. 12. Distances between C4i and C4i−1 for Client 1.

9

10

I. Capuni, N. Zhuri and R. Dardha / Ad Hoc Networks 91 (2019) 101878

Fig. 13. Distances between C4i and C4i−1 for Client 2.

Fig. 14. Distances between C4i and C4i−1 for Client 3.

Fig. 15. Distances between C4i and C4i−1 for Client 4.

The measurements yield that average distance between the C4i for HLS range between 4.5 s and 6.5 s. Clearly, for HLS protocol, 4 s gaps on the origin video server are reflected to 4 + c seconds gaps on the destination, where c ∈ [0.5s, 3s]. For NTP, 1.875 s distances between C1i and C1i−1 are reflected to the threefold value on average of the distances between C4i and C4i−1 (see Fig. 3). While t2i − C1i depends on the traffic condition, t3i − t2i depends on the server load and other factors. These findings point out that, even though TCP is much slower than UDP, the nature of the content (video being transported by HLS over TCP protocol) favors fast, timely and reliable transportation of time-sensitive information. Now we can discuss choices of parameter values. Since a segment is 4 s long, in 60 s approximately 25 I-frames per minute

can be obtained in normal conditions. For this, we have set n = 10. With this setup, as demonstrated in Fig. 7b, the largest time interval in which TimeStream cannot produce a time estimate is 140 s in adverse network conditions when HLS, after a short stall, downgraded the multimedia content to audio only. 5. Conclusions and future work TimeStream algorithm presented in this paper, performs clock synchronization of a machine using only information that is obtained from a video stream that is delivered using HLS protocol, which runs on top of HTTP. By design, NTP allows asymmetric response in size to a request allowing for amplification DoS attacks. Our algorithm does not is-

I. Capuni, N. Zhuri and R. Dardha / Ad Hoc Networks 91 (2019) 101878

sue any request: it simply analyzes information that is made available in video streams. According to CISCO estimates (see [9]), the demand for this kind of traffic is rapidly growing. As a result, our plain algorithm synchronizes the clock accurately within seconds. This limitation comes from the fact that by default, HLS server does injects timestamps in the HH:MM:SS format. If we set the server to inject UTC timestamps, then the precision ranges from 0.025 s up to 0.5 s. An interesting feature of the proposed method is that it adopts fast to network congestion. Indeed, TimeStream requires information from streams that use adaptive bitrate streaming protocols, which do adapt quickly to network conditions. Not only that HTTP video streaming protocols are sensitive to network conditions, but also, their underlying transportation protocol – TCP – is too. Indeed, TCP is devised with a congestion control mechanism which actively changes the amount of traffic it is transporting according to the network conditions. Our algorithm is scalable and CDN friendly and do benefit from it by using Marzullo’s algorithm to obtain precise time information from different sources. TimeStream is suitable not only for the edge of the network but also for the network devices adhering to it as they can see requests and responses of the edges involved in the stream. It should be explored and analyzed if our method is suitable for network devices that are far from the edges. As we are witnessing massive proliferation of mobile devices capable of displaying, capturing and streaming of multimedia content and that are equipped with GPS devices, the need to synchronize the clock may become vacuous since GPS unit provides highly accurate time. However, what is remarkable about the proposed method is that, one can provide necessary modifications and use it “in reverse”: clients that are streaming multimedia content from their mobile devices can act as the origin of the accurate time for the device on the other side, which in turn can use Marzullo’s method to obtain accurate time from multiple streams. It is natural to ask if the same method would work on data captured from plain HTTP traffic not linked to video streaming. Unfortunately, the flow of data containing information about time in this case is sporadic and is usually spread over multiple connections. Most importantly, unlike in the video streaming case, there is no incentive to enforce that the gaps between the responses of a web server be constant and minimal: adaptive streaming guarantees that the packets carrying time information will be transferred swimmingly. This raises the need to identify other possible applications that produce a data stream over TCP that exhibit useful properties that can be exploited to extract efficiently and with high precision useful information like time, geographical position etc. It should be studied further if the change of the underlying transportation protocol of HTTP protocol does affect TimeStream. Such protocols are already announced (for QUIC see [18]), but there are no reasons to believe that such “leaner” than TCP protocols should introduce problems for TimeStream. Indeed, TimeStream heavily depends on the quality of video streams carrying time information. For this, it is natural to expect that the quality of video streaming will only rise with the introduction of these novel protocols and therefore, TimeStream and similar methods should only benefit from these new and efficient transportation protocols. An interesting issue is if the TimeStream approach would also work with the encrypted video traffic. Even though this traffic is usually encrypted, some information about time (stored for instance in parameter begin in the case of Youtube) is available in plaintext. This kind of traffic, as demonstrated in [19], has many interesting properties which can be very useful for synchroniza-

11

tion of clocks. For instance, while the main video is streamed from the origin server, advertisements that are shown intermittently are usually streamed from different sources that are typically closer to the destination and that do have synchronized time with the server. This can be used in a natural way with Marzullo’s algorithm, or also as a source for verification if the time obtained from the main server is correct. Conflict of Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. References [1] D.L. Mills, Internet time synchronization: the network time protocol, IEEE Trans. Commun. 39 (10) (1991) 1482–1493. [2] A. Malhotra, I.E. Cohen, E. Brakke, S. Goldberg, Attacking the network time protocol, Network and Distributed System Security Symposium, 2016. [3] J. Selvi, Bypassing HTTP Strict Transport Security, Black Hat Europe, 2014. [4] D. Godin, New dos attacks taking down game sites deliver crippling 100gbps floods, 2014, http://arstechnica.com/security/2014/01/new- dos- attacks- takingdown- game- sites- deliver- crippling- 100- gbps- floods/ [5] M. Manzo, T. Roosta, S. Sastry, Time synchronization attacks in sensor networks, in: Proceedings of the 3rd ACM workshop on Security of ad hoc and sensor networks, ACM, 2005, pp. 107–116. [6] T. Mizrahi, Security requirements of time protocols in packet switched networks, 2014. [7] A. Malhotra, S. Goldberg, Attacking ntp’s authenticated broadcast mode, ACM SIGCOMM Comput. Commun. Rev. 46 (2) (2016) 12–17. [8] T. Rytilahti, D. Tatang, J. Köpper, T. Holz, Masters of time: an overview of the ntp ecosystem, in: 2018 IEEE European Symposium on Security and Privacy (EuroS&P), IEEE, 2018, pp. 122–136. [9] CISCO, Forecast and methodology, 2014–2019, Cisco Visual Networking Index, 2015. http://www.cisco.com/c/en/us/solutions/collateral/service-provider/ ip- ngn- ip- next- generation- network/white_paper_c11-481360.html [10] J. Appelbaum, Tlsdate - secure parasitic rdate replacement, IOError, 2009. https://github.com/ioerror/tlsdate [11] R. Canetti, K. Hogan, A. Malhotra, M. Varia, A universally composable treatment of network time, in: Computer Security Foundations Symposium (CSF), 2017 IEEE 30th, IEEE, 2017, pp. 360–375. [12] O. Deutsch, N.R. Schiff, D. Dolev, M. Schapira, Preventing (network) time travel with chronos, in: Network and Distributed Systems Security Symposium (Proceedings of NDSS 2018). San Diego, CA, USA, 2018. https://doi.org/10.14722/ ndss [13] Z.B. Simpson, A Stream-Based Time Synchronization Technique for Networked Computer Games, ZBS, 20 0 0. http://www.mine-control.com/zack/timesync/ timesync.html [14] H. Hassanzadegan, N. Sarshar, A New Method for Clock Recovery in MPEG Decoders, Basamad Negar Company, Tehran, Iran, 2002, pp. 1–8. [15] K. Marzullo, S. Owicki, Maintaining the time in a distributed system, in: Proceedings of the second annual ACM symposium on Principles of distributed computing, ACM, 1983, pp. 295–305. [16] S.B. Moon, P. Skelly, D. Towsley, Estimation and removal of clock skew from network delay measurements, in: INFOCOM’99. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE, 1, IEEE, 1999, pp. 227–234. [17] P. Gács, L. Lovász, Khachiyans algorithm for linear programming, in: Mathematical Programming at Oberwolfach, Springer, 1981, pp. 61–68. [18] A. Langley, A. Riddoch, A. Wilk, A. Vicente, C. Krasic, D. Zhang, F. Yang, F. Kouranov, I. Swett, J. Iyengar, et al., The quic transport protocol: design and internet-scale deployment, in: Proceedings of the Conference of the ACM Special Interest Group on Data Communication, ACM, 2017, pp. 183–196. [19] P. Ameigeiras, J.J. Ramos-Munoz, J. Navarro-Ortiz, J.M. Lopez-Soler, Analysis and modelling of youtube traffic, Trans. Emerg. Telecommun.Technol. 23 (4) (2012) 360–377. Ilir CAPUNI is an assistant professor of computer science at University of New York Tirana, where he has established and chaired Advanced Computation Research Center. His research is focused on reliable computation, computer networks and fault-tolerance in general. He obtained his PhD in computer science from Boston University in 2013. Currently, he is running the Excellence Labs Montenegro.

12

I. Capuni, N. Zhuri and R. Dardha / Ad Hoc Networks 91 (2019) 101878 Nertil ZHURI is a Software Engineer and former instructor of Computer Science at University of New York Tirana. As a software engineer, he specializes in outsourcing cutting edge software solutions to businesses in forms of web, desktop, and mobile applications. He obtained his MSc and BSc degrees from Epoka University in Tirana with a focus on computer networks, and worked as a research assistant on the same field on different works related to streaming and wireless technologies. Nertil enjoys travel and photography, but can also be found designing and working on video games.

Rejvi DARDHA is a recently graduated Networking Engineer professional. He completed his Bachelor’s degree in Computer Science in the Faculty of Natural Sciences, University of Tirana and obtained his MSc degree from University of New York Tirana with a strong focus on Computer Networking.