Passive browser identification with multi-scale Convolutional Neural Networks

Passive browser identification with multi-scale Convolutional Neural Networks

Passive Browser Identification with Multi-Scale Convolutional Neural Networks Communicated by Dr XIANG Xiang Bai Journal Pre-proof Passive Browser ...

6MB Sizes 0 Downloads 51 Views

Passive Browser Identification with Multi-Scale Convolutional Neural Networks

Communicated by Dr XIANG Xiang Bai

Journal Pre-proof

Passive Browser Identification with Multi-Scale Convolutional Neural Networks Saeid Samizade, Chao Shen, Chengxiang Si, Xiaohong Guan PII: DOI: Reference:

S0925-2312(19)31404-3 https://doi.org/10.1016/j.neucom.2019.10.028 NEUCOM 21373

To appear in:

Neurocomputing

Received date: Revised date: Accepted date:

9 April 2019 25 August 2019 5 October 2019

Please cite this article as: Saeid Samizade, Chao Shen, Chengxiang Si, Xiaohong Guan, Passive Browser Identification with Multi-Scale Convolutional Neural Networks, Neurocomputing (2019), doi: https://doi.org/10.1016/j.neucom.2019.10.028

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier B.V.

Passive Browser Identification with Multi-Scale Convolutional Neural Networks Saeid Samizadea,1 , Chao Shen a,1,∗, Chengxiang Si3 , Xiaohong Guana,1,2 a

No.28, Xianning West Road, Xi’an, Shaanxi, 710049, P.R. China

Abstract Browser Identification is the act of recognizing web traffic through surveillance despite the use of encryption or anonymizing software. Although previous work has reported some promising results, browser fingerprinting is still an emerging technique and has not reached an acceptable level of performance. This paper presents a novel approach by using deep-convolutional-neural-network-based (deep CNN) learning model to extract the complete shape of traffic I/O graph signal in obtaining stable traffic characteristics, employing nonlinear multi-class classification algorithms to perform the task of browser identification. The approach is evaluated on a new dataset collected across a large number of websites. Extensive experimental results show that traffic characteristics which are learned from I/O graph by deep CNN are much more stable and discriminative than the metrics those are obtained from the early studies, and the approach achieves a practically useful level of performance with significant precision and recall. Additional experiments on the depth of deep CNN are provided to further examine the applicability of our approach. Our dataset is publicly available to facilitate future research. Keywords: Browser Fingerprinting, Convolutional Neural Networks (CNNs), Net Flow ∗

Corresponding author Email addresses: [email protected] (Saeid Samizade), [email protected] (Chao Shen ), [email protected] (Chengxiang Si), [email protected] (Xiaohong Guan) 1 Ministry of Education Key Lab for Intelligent Networks and Network Security, Xi’an Jiaotong University, P.R.China. 2 Center for Intelligent and Networked Systems and TNLIST Lab, Tsinghua University, P.R.China. 3 National Computer Network Emergency Response Technical Team/Coordination Center of China (CNCERT/CC). Preprint submitted to NeuroComputing

October 17, 2019

1. Introduction With the rise of ever growing and sophisticated Web applications, web browsers became the dominant interface connecting the user to computing platforms. Due to the different implementations in web browsers, it is feasible to recognize a web browser by analyzing the network traffic data. In the last decade, flow recording applications are getting more popular and an increasing number of intrusion detection software is being developed to collect data of a flow for security reasons like malware detection [1, 2] and even network attack detection [18, 44]. Machine Learning methods are developing rapidly in the last decade. Artificial Neural Networks (ANNs)play an important role in this field and they are widely used to solve many problems (e.g. nonlinear control systems [15] and finding search policies for sequential learning tasks [8, 27]). Applying machine learning methods to these collected data can help network data analysts to accurately measure the web traffic which is made by real human users (not machine) and decrease various types of network intrusion and fraud. In other words, browser identification methods can be used to predict the likelihood of fraud commitment. Another important motivation of browser identification is user identification and tracking [4, 6]. In a classifier, the main goal is to infer an accurate prediction from labeled samples of a dataset. In other words, the dataset consists of input and output pairs in the form of (x, y) where x presents the set of features which extracted of the traffic data (in this case) and y indicates the browser which has been used in the flow. Browser identification is categorized into two types: active and passive. In the active models, the client machine can have various query types during the data collection. On the other hand, in the passive models, there is no obvious querying of the client machine. Passive methods concern the precise classification of extracted features as the network flow. Currently, a network model named CNN (Convolutional Neural Network) with its variants has become an efficient solution of classification tasks and feature extraction problems [40, 14]. This model is designed for image classification tasks such as medical image analysis [5] and other computer vision tasks such as visual tracking [11]. In recent years, some researchers have used CNN models for other classification tasks (e.g. speech recognition [30]). Compared with typical deep neural networks [3], the CNN adds convolutional and max-pooling layers to implement an accurate model for classification. And 2

moreover, as a deep learning method, the CNN can avoid manual feature extraction procedure and generate a complicated non-linear model. Despite of its power in feature characterization, the CNN fail to achieve satisfying performance for several specific classification tasks. In other words, the model parameters need to be tuned carefully for any particular tasks. In this paper, we present a novel multi-scale convolutional neural network which does not require any data preprocessing. Therefore, it can be used without change for multiple types of data. We first successfully apply our proposed method into the Browser Identification task, which recognizes the browser by analyzing web traffic through surveillance despite the use of encryption or anonymizing software. Compared with existing systems and mechanisms, our traffic-learningbased approach for recognizing browser is simpler but more efficient. As our model works passively and does not need any specific traffic preprocessing, it can be used directly for any types of traffic and net environment without any changes. We have spent two months building up a very comprehensive traffic dataset from 500 most popular websites in China, including four popular web browsers (Google Chrome, FireFox, Internet Explorer, and Opera) under Windows OS. We compare the performance of our proposed method and other two common traditional classifiers (SVM [38] and Random Forest [16]) on our own dataset. The average identification accuracy of our proposed model achieves 97.55%, which significantly outperforms the traditional techniques and approaches a practically applicable level. Our main contributions can be summarized into four-folds: • Firstly, we propose a new CNN structure, which contains deep Convolutional and max-pooling layers. This model can be applied to accomplish various types of classification tasks without extra changes. • Secondly, we lay empirical work of passive web browser identification on the I/O graph signal. Our result indicates that we successfully promote the browser identification accuracy to a practically applicable level. • Thirdly, we establish a comprehensive web traffic dataset coming from four mainstream web browsers and 500 websites. From the perspective of quantity and variety of data, our dataset remains high-quality, which supports and motivates further research in the related field. • Fourthly, we have extracted and combined the information of "data transfer rate" over the time with the collected dataset and embedded both of them 3

in one single signal. It reports a higher performance and efficiency in our results. The remaining of this paper is organized in the following way: Section 1 briefly introduces the basic background of browser identification and reviews previous work. The methodology is described in detail in Section 3, experimental results are given in Section 4, and conclusions and future work are discussed in Section 5. 2. Related Work This section first gives a brief overview of the recent history of Browser Identification from the perspective of Network Security. Followed by, the methods are discussed from Artificial Intelligence perspective. In general, browser identification is categorized into two types: active and passive. In the active models, the client machine can have various query types during the data collection. On the other hand, in the passive models, there is no obvious querying of the client machine. Passive methods rely upon precise classification of extracted features as the network flow. Besides, the encryption and traffic anonymization further reduce the discrimination of data from different sources. Hence, the passive model remains a challenging task for scholars. There is also another term "Browser Fingerprinting" in this context. The word Fingerprinting is not bio-metric measurement of fingertips. In this context, the mentioned term means identifying and extracting particular information about the browser using it’s available features (e.g. traffic data in this case). According to the differences between active and passive fingerprinting methods [7, 9], a passive model has been chosen in this study to infer the implementation of web browsers or other network applications based on monitoring the traffic they send. Many studies in the last decade have shown that the TCP data is a significant characteristic of a net flow [10]. A more relevant work was a passive TCP fingerprinting [12] that is about TCP/IP implementation, and TCP classification models have been proposed for this problem [13]. Recent developments in the field of browser fingerprinting have led to a renewed interest in passive techniques such as packet traces [17]. On the other hand, some researchers also focused on statistical fingerprinting and network behavior [19, 20, 21]. Coarse flow recording is an increasingly important area in some studies [22, 23, 41]. As a relevant field of research, some researchers started to have traffic data fingerprinting in different levels such as OS and DNS fingerprinting [24, 10, 25, 6]. In very recent years, many types of researches has been done on browser fingerprinting [26, 28, 23]. An investigation has been 4

done recently on the browser fingerprinting via OS and some Hardware Level Features such as CPU and graphics cards with the 99% accuracy [29] and even some literatures took advantage of combining features [45] and Web Audio API [46] for browser fingerprinting problem. Thus our objective is to develop a simple and efficient traffic-learning-based approach for a passive browser recognizer that takes raw traffic I/O graph data as input. Unlike competing systems, our model does not require any specific traffic preprocessing. Therefore, it can be used unchanged for any type of traffic. Because of having sequences of raw traffic data as input, the basic approach could be sequential learning models and sequential pattern mining algorithms [31]. Besides, recent developments in the field of Deep Learning [32], led us to a renewed interest in Artificial Neural Networks [33]. In this paper, a novel and passive browser identification system have been built by proposing new Convolutional Neural Network (CNN) [34, 35], which would work on raw traffic data. As well as being website-independent, such a system would have the advantage of being globally trainable, with the traffic features optimized along with the classifier. This article has been organized in the following way. The methodology is described in detail in Section 3, experimental results are given in Section 4, and conclusions and future work are discussed in Section 5. 3. Methodology In this section, we employ deep convolutional neural network based (DeepCNN) learning model to extract the complete shape of traffic I/O graph in obtaining stable traffic characteristics, and develop nonlinear multi-class classification algorithms to perform the task of browser identification. The starting point of this research is to classify web browser through the examining traffic data at a packet level. However, different from the OS and some hardware feature used for browser fingerprinting [29], our research combines multi-scale partitioning with feature extraction and browser classification into a united network. There are 4 major parts in our network: "packet-level features", "multi-scale partitioning", "sub-signal level representation" and "browser classification sub-network". The proposed CNN model has been employed to solve the browser identification problem for the first time using the I/O graph signal as the input feature. A simple I/O graph signal is shown in Fig. 1. The comprehensive description about I/O graph signal is presented in the Section 4.1.1.

5

30

Packets

25 20 15 10 5 0

0

25

50

75 100 125 Time (100 milliseconds)

150

175

200

Figure 1: A captured I/O graph signal of a net flow

Given an input signal, the network outputs sub-signal level classification results in a single forward propagation. Inside the network, firstly, an I/O graph signal is fed into the convolutional layers of the network, whose structure is a normal CNN structure, and the input is a one-dimension signal, to generate a hierarchy of feature maps. Followed by, feature maps are subsampled by convolutional layers in the same size which are attached together in depth. Next, the maps are partitioned into sub-signals of different sizes spatially. Finally, feature vector for each sub-signal is fed into the fully connected layers which make the multi-class classification for that sub-signal. The architecture of the proposed methodology is illustrated in Fig. 2, which requires the I/O graph signal as an input and examines all the sub-signals.

Hyper features

Convolutional layers

Feature maps

Multi-scale sub-signals

Signal-level feature

Multi class classifier

Figure 2: The architecture of proposed model

3.1. Packet level features All I/O graph signals of our dataset are scaled to have a fixed length (200 values in this case). The feature generation part of our proposed network consists of some convolutional layers that corresponds to a normal CNN model, which has achieved superior performance on image classification. Given the input signal, the convolutional layers obtain a hierarchical feature maps, where the size of maps are produced by different layers. In a CNN architecture, every convolutional layer has 6

a particular receptive field size [36] (this field is a two dimensional block for images and one dimensional sub-signal in this case). This receptive field indicates the size of I/O graph signal region which all nodes of the feature maps are connected to. In one hand, the smaller receptive field, leads us to get finer feature map. On the other hand, larger receptive field, leads us to get rougher feature map. In our proposed network, the receptive field size increase along convolutional layers, which enables it to describe lower-level local features to higher-level global context. The convolutional layer is shown in Fig. 3.

Figure 3: The convolutional layer

According to the Fig. 3, the procedure of how the feature maps can be obtained along convolutional layers is formulated as: hnj

= max(0,

K X k=1

n hn−1 × wkj ) k

(1)

n Where hnj is the j th feature map of convolutional layer n and wkj is the kernel which has been learnt in the layer n.

3.2. Multi scale partitioning Using the Region of Interest (ROI) pooling layer which is designed for feature extraction in Fast R-CNN [37], is useful to the operation of multi-scale partitioning projection from packet-level space to feature-level space in order to achieve the features of each sub-signal. In the partitioning part, the generated feature maps are partitioned into g blocks respecting to different block sizes. The sub-signal size of l is defined for this research, where l is the length of the feature map and N is an N 7

integer number. Therefore, our feature maps will be partitioned into N sub-signals. The formulation of partitioning is: l l )|0 ≤ x < (2) N N Where f (x) illustrates the generated feature maps, f i is the sub-signal which starts at ith element in the input I/O graph signal (0 ≤ i ≤ N − 1). All sub-signals on the feature maps are referred to a region on the input I/O graph signal as: f i (x) = f (x + i

L L )|0 ≤ x < (3) N N Where I(x) is an input I/O graph signal which its size is L. In our proposed method, the feature sub-signal describes its corresponding signal region. Additionally, a multi-scale partitioning is performed on the input I/O graph signals by setting incremental values for N . That gains feature sub-signals with different sizes. The feature sub-signals describe local I/O graph signal regions of different sizes and it is expected that they will be able to recognize any specific behavior in sub-signals. It is necessary for all units and operations of a Convolutional Neural Network to back propagate the error differentials through the network. The formulation of multi-scale back propagation in our proposed CNN model is as follow: I i (x) = I(x + i

X δLF δLF (x − i ∗ K), 0 ≤ x < w (x) = δF δF i i=b Kx c K

(4)

Where LF refers to the Loss Function which is described in detail in Section 4.3, K is the number of scales (e.g. 3, 5 and 7) and δLF denotes the ith feature δF i slice. 3.3. Sub-signal level representation In this part, the Sub-signal level representation is described in detail. According to the previous parts, different scale values are used in multi-scale partitioning (5 different scales are used to partition feature maps into 3, 5, 7, 9 and 11 feature sub-signals). It is necessary here to clarify exactly what is our proposed model and how it works. Firstly, an input I/O graph signal is passed through five convolutional layers and obtains some feature maps. Then, multi-scale partitioning is performed on the input signal with different sub-signal sizes. Finally, each sub-signal feature should be normalized to the corresponding size in order to be fed in nonlinear multi-class 8

classifier at the end of our united network. In the next section, the experiments of this research are described comprehensively. 4. Experiments In this section, we first explain the importance of collecting new dataset and the reasons of why we did not use previous datasets. Then, we compare our proposed CNN model with SVM (Support Vector Machine) and Random Forest algorithms which are state of the art general classifiers. We also explain implementation parameters in detail for all models. 4.1. Datasets There are different datasets that could be used in this investigation. These datasets have several amount of samples all were recorded by Argus. These datasets are listed as follow: • The CMU dataset: The traffic is recorded from the edge routers of the wired CMU campus network over 6 weeks in 2007, tested browsers are Internet Explorer, FireFox, Safari and Opera. Each flow is recorded for 10 seconds [41]. • The PlanetLab-Native dataset: The dataset is gathered by visiting 150 top most popular websites in the U.S. The data is collected in Linux OS with FireFox and Opera browsers. Each flow is recorded for 30 seconds [42]. • The PlanetLab-QEMU dataset: The samples of dataset are recorded by visiting the same 150 websites as previous one in Windows OS using Internet Explorer, FireFox, Opera and Safari browsers. Each flow is recorded in 30 seconds [43]. In this paper we take advantage of specific part of flow which is I/O graph signal and this faeture is not available in the mentioned datasets. That is the main reason that we collected our own dataset. The complete process of data collection is explained in the next section in details . 4.1.1. The collected Browser Identification Dataset Generally, there are few features for a net flow those are recordable by network monitoring tools. The three most well-known examples of these features are: 1The number of packets, 2- The number of bytes and 3- The duration of the net 9

Packets

flow (in seconds). However, there are some more features such as average packet per second or average packet size, but they can be calculated by three mentioned features. In this investigation, we name them "main features". All data of a net flow is recorded in a pcap file by a network monitoring tool. Finally, during two months, a very comprehensive dataset has been collected which its data were from the 500 most popular websites in China (according to Alexa). Each website has been monitored as a single net flow in 20 seconds and for this matter, the four popular web browsers including; Google Chrome, Fire Fox, Internet Explorer and Opera have been used. All these are collected in Windows OS. A very interesting achievement in this research is that, each browser shows specific behavior in its I/O graph signal significantly. The I/O graph of a net flow illustrates the packet per second value corresponding to the time as a signal. In order to achieve the I/O graph of a net flow, the corresponding pcap file is required and this paper is the first research that uses I/O graph as a feature vector. Therefore, collecting a new dataset is inevitable and because of this reason, we could not test our method on other datasets. Fig. 4 illustrates the average of all instances for each class. 17.5 15.0 12.5 10.0 7.5 5.0 2.5 0.0

Internet Explorer FireFox Google Chrome Opera

0

25

50

75 100 125 Time (100 milliseconds)

150

175

200

Figure 4: The average of all I/O graph signals for each class

The time duration for each net flow in our dataset is set to 20 seconds but the length of the I/O graph signal depends on the time interval parameter. If the interval is set to 0.1, there are 10 values for each second and the length of I/O graph is 200. The time interval for all net flows in our dataset has been set to 0.1. Clearly, the length of the I/O graph signal is 200. Each of 500 websites has been visited by 4 browsers. Hence, there will be 2000 training samples (500*4) and accordingly, a matrix with the size of 2000*200 has been labeled in 4 classes. To reach a reliable dataset, this operation has been repeated for 50 times inside and outside of University campus with different 10

internet lines and on different machines. Finally, the dataset has been shaped in 50 matrices, each matrix has 2000 rows (training samples) with 200 columns (raw features) and an extra column as the label. There are different types of websites which are designed for various purposes e.g. search engines, online markets, news, media, online games, etc. The collected dataset is partitioned into two parts, the Training set and the Testing set. We have tried to include all types of websites in both Training and Testing sets. The size of Training set is 350 samples out of 500 ones which is 70% of all and obviously, there are 150 samples (30% of all data) for the Testing set. Therefore, totally there are 1400 samples in our Training set and 600 samples in our Testing set. In this research, three different algorithms are performed on two types of data. For each net flow (training example) our dataset consists of two parts. The main features (mentioned in previous part) and the I/O graph signal. Moreover, the corresponding signal of bytes per millisecond for I/O graph signal is provided. Table 1 shows all main features and their units for a sample training example. Table 1: The main features of a sample net flow (unique features are highlighted)

feature Packets Between first and last packet (time) Average packets per second Average packet size Bytes Average bytes per second Average Megabit per second

unit Packets Seconds Packets/second Bytes Bytes Bytes/second Megabits/second

The original captured I/O graph signal data only contain the Packets per second values over the time. One of the most important features that should be considered here is the Packet size. During a net flow, the size of packets changes. Our solution is to find the pattern of this change. Hence, the Bytes per second signals was extracted from our data as well in order to be combined with the I/O graph signal. According to the Table. 1, we have three main features which are unique. The captured I/O graph data do not contain the information about Packets per second feature. Then, we combined the new signals with the original captured I/O graph signals by dividing the Bytes per second values of the new signal to the corresponding values in the original I/O graph signal. The newly obtained signal which has the same length as the original I/O graph signal is called combined I/O 11

graph signal in this research. Afterward, the multi-class models of the algorithms SVM and Random Forest beside our proposed CNN model have been performed on original captured I/O graph signal data and the combination of I/O graph signal with bytes per second corresponding signal. Finally, four different results are gained to have a comprehensive comparison. 4.2. Implementation details There are some parameters which control the operation of a CNN model. In our proposed model, the number of convolutional and max pooling layers is set to 5. In the next parts, it is discussed that why we have chosen 5 layers. In our method, the input data is a signal instead of an image. Consequently, instead of a Window Size (WS), we have slice size (SS). In addition, we have the Kernel field size (K) as all CNN models but it is defined in one dimension. The rest parameters are Stride (S), which denotes the step of slice movement along the signal and zero Padding size (P), which is used in the border of the slice. Table. 2 illustrates the details of our proposed CNN model parameter configuration for all layers. Table 2: The details of proposed model. In convolutional layers, parameters ’K’, ’S’ and ’P’ refer to Kernel, Stride and Padding. ’SS’ is Slice Size in max pooling layers

Layer 1st convolutional 1st max pooling 2nd convolutional 2nd max pooling 3rd convolutional 3rd max pooling 4th convolutional 4th max pooling 5th convolutional 5th max pooling

Configuration Map: 16, K: 3, S:1, P:1 SS: 2 Map: 32, K: 5, S:1, P:1 SS: 4 Map: 64, K: 7, S:1, P:1 SS: 6 Map: 128, K: 9, S:1, P:1 SS: 8 Map: 256, K: 11, S:1, P:1 SS: 10

The main reason to choose multi-scale sub-signal sizes in the proposed model is that, because our used data I/O graph is a time varying signal. Some factors like "the speed of internet line" or "the day time of data collecting" affect the result. Using different sub-signal sizes helps us to have a robust model against any distortion and the designed model is able to detect any coarser or finer patterns. At the last step of the proposed model, there is a nonlinear multi-class classifier. All of the representations of the sub-signals have been fed into the classification 12

sub-network. This classifier is a sub-network of the proposed model and it takes hyper features which are extracted from max pooling layers as inputs. In the next part, the experimental results are described. 4.3. Training of proposed model According to the section Methodology, our proposed deep architecture model consists of different sub-tasks. Our network is initialized with VGG Net [39] and all matrices are initialized randomly. Using the sub-gradient method with momentum, our goal is to minimize the loss function. In this investigation, the momentum is set to 0.9 for all tests. The label of our samples in the dataset is actually the ground truth. The proposed multi-scale CNN is designed for depth browser Identification. The loss function in our training step is defined as: L

L

1X 2 1 X 2 di ) LF = (di + (∇di )2 ) − 2 ( L i=1 2L i=1

(5)

where i is the index of value in the input signal, L is the length of the input signal, di is the difference between ground truth data and calculated data and ∇di is the gradient of the difference di over the input signal. 4.4. Experimental results After feature extraction using CNN and forming new hyper features, we have to classify our instances. As it was expected, the accuracy result of our proposed model outperformed the traditional classifiers (SVM and Random Forest). There are several possible explanations for this result. Regarding to the way of illustration, we have performed our proposed CNN model to the original I/O graph and the combined I/O graph dataset beside SVM and Random Forest.The Receiver Operating Characteristic (ROC) curve is a graphical way to illustrate the efficiency of classifiers. The ROC curves of all algorithms applied on the original I/O graph data is plotted in Fig. 5. This result shows that our proposed model outperforms both SVM and Random Forest magnificently. The accuracies of SVM and Random Forest algorithm are in the same level but SVM is slightly better than Random Forest. We have repeated the test with the same conditions on combined I/O graph data. The ROC curves of the algorithms are plotted in Fig. 6. Comparing these two results shows that although using the combined I/O graph data improves our model with 2% but it reports a high impact on other algorithms 13

True Positive Rate

1.0 0.8 0.6 0.4 0.2 0.0 0.0

SVM on original I/O (area = 0.578) Random Forest on original I/O (area = 0.528) Proposed model on original I/O (area = 0.958) 0.2

0.4

0.6

False Positive Rate

0.8

1.0

Figure 5: The ROC curve for each algorithm on original I/O graph data

(23% improvement on SVM and 30% improvement on Random Forest). The observed increase in accuracies of SVM and Random Forest could be attributed to the consideration of all unique features of net flows. In other words, the combined I/O graph data contains all "Packet transfer rate", "Data transfer rate" and "Time" as unique features but the original collected data only contains "Packet transfer rate" and "Time". It can be inferred that the signal of "Packet size" over the time is also embedded in combined I/O graph data. In Table 3, there is a clear trend of increasing precision. However, in general, we can see a significant improvement in precision in comparison between our proposed CNN model and traditional classifiers (SVM and Random Forest). The accuracy of each class is presented in detail. According to Table. 3, there are slightly feasible similarities between two groups of browsers. Higher similarity between Internet Explorer and Opera as the first group and the same similarity between FireFox and Google Chrome. In Table. 4, the confusion matrix (misclassification matrix) of our method on combined I/O graph data is shown completely. It is very important to know that when the system cannot classify the input sample correctly, its answer belongs to which class. It can illustrate the similarity and difference between classes and it is necessary to analyze the result when the precision of our classes are different. As it discussed before, the I/O graph signal, which presents the packets per 14

1.0

True Positive Rate

0.8 0.6 0.4 0.2 0.0 0.0

SVM on combined I/O (area = 0.821) Random Forest on combined I/O (area = 0.835) Proposed model on combined I/O (area = 0.975) 0.2

0.4

0.6

False Positive Rate

0.8

1.0

Figure 6: The ROC curve for each algorithm on combined I/O graph data

second values during the time, has been fed to our proposed CNN model as the raw input data. Due to the section dataset, the most important and unique features of a net flow (in main features), are number of packets, number of bytes and the duration. The I/O graph signal only refers to the number of packets during the time. In other words, the I/O graph contains two third of them. Using the combination of packets per second with corresponding bytes per second values (as it described before), leads us to achieve a new signal which is "packet size over the time". The rational solution is to have both packets per second and bytes per second during the time in our input data. It gives us more information about the behavior of browsers in a net flow. The achieved result reports around 2% of improvement by using the combined I/O graph signal instead of using only original I/O graph signal. Our experiments showed the impact of combined I/O graph data on the results. Now, we are going to show four different analyses on our proposed model as below: 1. 2. 3. 4.

The impact of multiple scales for partitioning. The impact of dataset size. The impact of each convolutional layer individually. The training time of all models.

These experiments help us to understand and analyze the system completely. All tests, results and discussions are written in the followed united section as 15

Combined I/O Original I/O

Table 3: The experimental results in detail.

SVM

Random Forest

proposed CNN

Internet Explorer FireFox Google Chrome Opera Overall

57.45% 58.03% 56.77% 59.11% 57.84%

52.47% 53.34% 52.23% 53.28% 52.83%

96.04% 95.92% 95.07% 96.21% 95.81%

Internet Explorer FireFox Google Chrome Opera Overall

81.71% 83.23% 81.52% 82.10% 82.14%

81.89% 84.16% 83.37% 84.66% 83.52%

96.71% 97.82% 97.49% 98.18% 97.55%

Actual Class

Table 4: The confusion matrix of proposed model on combined I/O graph data.

IE FF GC OP

IE 96.71% 0.73% 0.0.66% 0.87%

Predicted FF GC 0.79% 1.03% 97.82% 0.78% 1.13% 97.49% 0.46% 0.49%

OP 1.47% 0.67% 0.72% 98.18%

"Comprehensive Evaluation". 4.5. Comprehensive Evaluation The comprehensive evaluation of our proposed CNN model and other classifiers is presented in this section. At the end of this part, a discussion about the limitation of proposed method is provided. 4.5.1. Impact of multiple scale for partitioning Since the numerous variance of behaviors which are visible in sub-signals, the proposed CNN model has explained the importance of multi-scale partitioning through the experiments with different groups of single layer partitioning. Only the layer of multi-scale partitioning has been changed with different numbers and

16

scales practically, respecting to the configuration of all layers. Fig. 7, the result of multi-scale partitioning is compared with single scale partitioning tests.

$YHUDJHDFXXUDF\

     

. 

. 

. 

. 

.  0XOWL.

3DUWLWLRQLQJVFDOH

Figure 7: The result of partitioning with different scales on the combined I/O graph data

However, in the case K = 9 the accuracy is lifted up to 80%, but there is still a big difference between multi-scale partitioning test and single-scale tests. A possible explanation for this might be that, the behavior signs of browsers which is observable in the I/O graph signal can be stretched out along the time because of internet speed. By using different scales of partitioning, all types of specific behaviors can be recognizable in the network. 4.5.2. Impact of dataset size In this section, the result of training our proposed model with different number of samples is illustrated. Our dataset, originally has 500 samples for each class. Different number of samples are considered as follow: 100, 200, 300, 400 and 500. The total number of samples in each test will be: 400, 800, 1200, 1600 and 2000 respectively. The percentage of training and testing data is the same as it was mentioned in the Section 4.1.1. Fig. 8 illustrates the effect of dataset size on the accuracy for all used algorithms. Each test is repeated 10 times and the average of them is considered. Two important findings can be extracted here. The first one is that, the result of proposed CNN model which uses combined I/O graph data shows a better stability against the change of data size in comparison with the model which uses only I/O 17

100

Accuracy

80 60 40

SVM on original I/O SVM on combined I/O Random Forest on original I/O Random Forest on combined I/O Propsed model on original I/O Proposed model on combined I/O

20 0

400.0

800.0

1200.0

1600.0

Number of samples in the dataset

2000.0

Figure 8: The impact of dataset size on the result

graph signal. The second finding is that, the Random Forest algorithm can compete with SVM only when it uses high amount of samples. Generally, it is inferred that there is a distinct gap between accuracies of performed traditional classifiers (SVM and Random Forest algorithms) on combined I/O graph data and the same models on original I/O graph data. 4.5.3. Impact of each convolutional layer individually In order to achieve a comprehensive evaluation on our proposed CNN model, we repeated our test on the gained hyper features from each Max-pooling layers. After each Max-pooling layer, the classification test is performed by feeding the achieved hyper features to a fully connected network. Fig. 9 shows the architecture of our test. The shape of our data is also clarified in Fig. 9. In each Max-pooling layer, the 3 best hyper features are chosen in order to visualize the distribution of the samples in a 3-dimensional space. The score function of selecting the best features is chi-squared (chi2 ). The test is performed in two main stages. The first test on I/O graph data and the second test on combined I/O graph data. There are 6 visualized distributions for each of two tests. The former one shows the distribution of input data (unprocessed data) and the five latter ones show the distribution of data with respect to the hyper parameters in the corresponding layer. Fig. 10 illustrates the result for I/O graph data. According to the Fig. 10, the samples of input data are highly nested together. 18

Test #5

Test #4

Test #3

Test #2

Test #1

Fully connected layer [256 × 6]

Max-pooling[1 × 2]

[256 × 12]

Conv Layer 4: 256 feature maps

[128 × 12]

Max-pooling[1 × 2]

[128 × 24]

Conv Layer 4: 128 feature maps

[64 × 24]

Max-pooling[1 × 2]

[64 × 48]

Conv Layer 3: 64 feature maps

[32 × 48]

Max-pooling[1 × 2]

[32 × 96]

Conv Layer 2: 32 feature maps

[16 × 98]

Max-pooling[1 × 2]

[16 × 196]

Conv Layer 1: 16 feature maps

Fully connected layer

Fully connected layer

Fully connected layer

Fully connected layer

[1 × 200]

Input

Figure 9: The architecture of comprehensive test

That is why the traditional classification algorithms (SVM and Random Forest) can not classify these samples very well. It is obvious that by increasing convolutional and Max-pooling layers, the samples are separating gradually. Fig. 11 Shows the result for combined I/O graph data and it reports the same result as Fig. 10 with some differences. In image classification tasks, Convolutional Neural Networks can extract hyper features which are known for human (e.g. eyes, nose, etc in face recognition). In this research, we used the same approach but the gained hyper features have no physical meaning for us. In other words, the proposed CNN model extracted reluctant signs of each class in the input signals as the hyper features. The output result of fully connected classifier after each Max-pooling layer can help us to understand that how the model works. The result of classification for both of two tests are shown in Table. 5. According to the Table. 5, adding convolutional and Max-pooling layers to the proposed network, increases the accuracy of classification task gradually as it was expected. In addition, it is shown that the accuracy of proposed model on combined I/O graph data is higher than the result of same algorithm on I/O graph data. 4.5.4. Training time of all models One of the most important factors in Machine Learning systems is "Training time" which is the time complexity of learning algorithm. The algorithm should not 19

Input data

Max-pooling #1 100 50

Max-pooling #2

e2

100 75 50 25 tur

fea

tur

e2

0 50 featur 100 e1

fea

100 50 0 featur50 100 0 e1

feature 3

feature 3

75 50 25

Max-pooling #3

feature 3

feature 3

100

100 75 50

50

125 100 50 75 75 featur 100 125 50 e1

2

tur e

fea

tur e

2

100 50 fea

25 50 featur 75 100 e1

Max-pooling #4

Max-pooling #5

300 250

e2

tur

200 featur 250 e1

200 300 250 200 fea

fea

tur

e2

150 100125 featur 150175 100 e1

feature 3

feature 3

175 150 125

Figure 10: The distribution visualization for original I/O graph data

be of polynomial time. Traditional classifiers are performed on the dataset beside all five tests according to the comprehensive evaluation which is shown in Fig. 9. In this evaluation we only perform the algorithms on one of the datasets (combined I/O graph data) because the size of both datasets are the same and it has no effect on the training time. The important thing in this evaluation is training time and the accuracy is not considered. To clarify the training time of all algorithms, we performed each test 10 times and consider the average of them. Fig. 12 illustrates the average of training time for all algorithms. The result of this test shows that SVM and Random Forest are approximately in the same level of time consumption which is expected. In addition, There are higher 20

Input data

Max-pooling #1 80 60 40 20

100 75 50 25

2

tur e

fea

tur e

2

100 75 0 50 50 25 featu re 1 100

fea

25 50 featu 75 re 1

feature 3

feature 3

100 75 50 25

Max-pooling #2

Max-pooling #3

200 180 160 140

Max-pooling #4

e2

225 200 175 tur

e2

tur

50

150175 featu 200225 re 1

fea

150 100 fea

50 featu100 re 1

feature 3

feature 3

100 75 50

Max-pooling #5

400 380 360 340

feature 3

feature 3

300 250

e2

400 350

tur

360380 featu 400420 re 1

fea

tur

e2

300 250 fea

250275 featu300325 re 1

Figure 11: The distribution visualization for combined I/O graph data

differences between first Convolutional layers than the differences between last convolutional layers. These differences can be explained in part by the reduction of hyper features amount in the last convolutional layers. 4.6. Limitations of the proposed CNN model We have two kinds of limitations in our proposed method: • Misclassification on high noisy data. • Misclassification on low amount of learning samples. 21

Table 5: The experimental results for two tests in each layer

Input data Max-pooling #1 Max-pooling #2 Max-pooling #3 Max-pooling #4 Max-pooling #5

Combined I/O graph data 71.41% 81.24% 87.13% 92.86% 95.06% 97.55%

SVM Random Forest Convolution Test 1 Convolution Test 2 Convolution Test 3 Convolution Test 4 Convolution Test 5

6000 5000

Time (second)

I/O graph data 64.23% 73.31% 81.67% 88.40% 92.77% 95.81%

4000 3000 2000

T5

T4

T3

T2

T1

RF

0

SVM

1000

Figure 12: The training time of all models regarding to comprehensive evaluation

There are some failures which exist in some worst cases however our proposed model outperforms other classifiers. Particularly, in unstable networks, the I/O graph signal of the net flow shows a high noise in the input signals. This can be illustrated briefly by an example. It is possible that in a network with a weak connection, the noise of the data transfer shows an I/O graph signal of a particular browser (e.g. Internet Explorer) more similar to signal of another browser (e.g. Opera). It makes the system to learn inputs in a wrong way and cause misclassifications. Our tests showed that the proposed model works efficiently when there are at least 300 training samples (for each class) in the dataset. When the number of 22

training samples is too low, the probability of overfitting on the model increases. It is better to train the proposed model with the highest possible amount of training samples. In the next section, the conclusion and discussion about the proposed algorithm and traditional classifiers are presented. 5. Conclusion and discussion This paper examines the problem of passive browser detection from I/O profiles for the first time. The I/O profile of a browser is captured as a time series of packets/sec send/received by a browser. Each browser is uniquely fingerprinted based on its I/O graph signal. Additionally, the corresponding bytes per second signal is combined with original I/O graph signal and the new data have been fed to the proposed model and reports a better accuracy. The paper proposes a Convolutional Neural Network (CNN) model with multi-scale sub-signal sizes. Our experiments illustrate that the new proposed model outperforms the traditional general classifiers in browser fingerprinting problem. Browser version detection is the next step of this research. Another step for this research is to compare the other deep learning models with our proposed CNN model. Moreover, generalizing of the proposed algorithm on other similar problems with standard public datasets is planned. References [1] T.-F. Yen, M. K. Reiter, Traffic Aggregation for Malware Detection, Springer Berlin Heidelberg, Berlin, Heidelberg, 2008, pp. 207–227. doi:10.1007/ 978-3-540-70542-0_11. 2 [2] X. Gui, J. Liu, M. Chi, C. Li, Z. Lei, Analysis of malware application based on massive network traffic, China Communications 13 (8) (2016) 209–221. doi:10.1109/CC.2016.7563724. 2 [3] J. Yang, W. Xiong, Sh. Li, Ch. Xu, Learning structured and non-redundant representations with deep neural networks, Elsevier, Pattern Recognition. Volume 86, February 2019, Pages 224-235. doi:10.1016/j.patcog.2018.08. 017. 2 [4] K. Boda, A. M. Földes, G. G. Gulyás, S. Imre, User tracking on the web via cross-browser fingerprinting, in: Proceedings of the 16th Nordic Conference on Information Security Technology for Applications, NordSec’11, 23

Springer-Verlag, Berlin, Heidelberg, 2012, pp. 31–46. doi:10.1007/ 978-3-642-29615-4_4. 2 [5] N. Tajbakhsh, J. Y. Shin, S. R. Gurudu, R. T. Hurst, C. B. Kendall, M. B. Gotway and J. Liang, Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning?, IEEE Transactions on Medical Imaging, 35, 5, 2016, pp. 1299–1312. doi:10.1109/TMI.2016.2535302. 2 [6] K. Takeda, User identification and tracking with online device fingerprints fusion, in: 2012 IEEE International Carnahan Conference on Security Technology (ICCST), 2012, pp. 163–167. doi:10.1109/CCST.2012.6393552. 2, 4 [7] D. E. Comer, J. C. Lin, Probing tcp implementations, in: USENIX Summer 1994 Conference, 1994, pp. 245–255. 4 [8] F. Xiong, B. Sun and X. Yang, H. Qiao, K. Huang, A. Hussain and Z. Liu, Guided Policy Search for Sequential Multitask Learning, IEEE Transactions on Systems, Man, and Cybernetics: Systems, PP, 99, 2018, pp. 1–11. doi: 10.1109/TSMC.2018.2800040. 2 [9] J. Pahdye, S. Floyd, On inferring tcp behavior, SIGCOMM Comput. Commun. Rev. 31 (4) (2001) 287–298. doi:10.1145/964723.383083. 4 [10] D. Fried, K. Piwowarski, W. Streilein, Passive operating system identification from tcp/ip packet headers, in: In Proceedings of the ICDM Workshop on Data Mining for Computer Security (DMSEC), 2003. 4 [11] L. Zhang and P. N. Suganthan, Visual Tracking with Convolutional Neural Network, 2015 IEEE International Conference on Systems, Man, and Cybernetics, pp. 2072–2077. doi:10.1109/SMC.2015.362. 2 [12] R. Beverly, A Robust Classifier for Passive TCP/IP Fingerprinting, in: Proceedings of the 5th Passive and Active Measurement (PAM) Workshop, 2004, pp. 158–167. 4 [13] T. F. Yen, X. Huang, F. Monrose, M. K. Reiter, Browser Fingerprinting from Coarse Traffic Summaries: Techniques and Implications, Springer Berlin Heidelberg, Berlin, Heidelberg, 2009, pp. 157–175. doi:10.1007/ 978-3-642-02918-9_10. 4 24

[14] J. Zhicheng, G. Xinbo, W. Ying, L. Jie, X. Haojun, Deep Convolutional Neural Networks for mental load classification based on EEG data, Elsevier, Pattern Recognition. Volume 88, April 2019, Pages 38-49. doi:10.1016/ j.patcog.2018.11.002. 2 [15] H. Jiang, H Zhang, Y. Luo and J. Han, Neural-Network-Based Robust Control Schemes for Nonlinear Multiplayer Systems With Uncertainties via Adaptive Dynamic Programming, IEEE Transactions on Systems, Man, and Cybernetics: Systems, PP, 99, 2018, pp. 1–10. doi:10.1109/TSMC.2018.2810117. 2 [16] N. Quadrianto and Z. Ghahramani, A Very Simple Safe-Bayesian Random Forest, IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 6, 2015, pp. 1297–1303. doi:10.1109/TPAMI.2014.2362751. 3 [17] V. Paxson, Automated packet trace analysis of tcp implementations, in: Proceedings of the ACM SIGCOMM ’97 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, SIGCOMM ’97, ACM, New York, NY, USA, 1997, pp. 167–179. doi:10. 1145/263105.263160. 4 [18] M. H. Jao, M. H. Hsieh, K. H. He, D. H. Liu, S. Y. Kuo, T. H. Chu and Y. H. Chou, A Wormhole Attacks Detection Using a QTS Algorithm with MA in WSN, 2015 IEEE International Conference on Systems, Man, and Cybernetics, pp. 20–25. doi:10.1109/SMC.2015.17. 2 [19] M. Crotti, M. Dusi, F. Gringoli, L. Salgarelli, Traffic classification through simple statistical fingerprinting, SIGCOMM Comput. Commun. Rev. 37 (1) (2007) 5–16. doi:10.1145/1198255.1198257. 4 [20] F. Hernandez-Campos, A. B. Nobel, F. D. Smith, K. Jeffay, Understanding patterns of tcp connection usage with statistical clustering, in: 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, 2005, pp. 35–44. doi:10.1109/MASCOTS. 2005.75. 4 [21] M. Roughan, S. Sen, O. Spatscheck, N. Duffield, Class-of-service mapping for qos: A statistical signature-based approach to ip traffic classification, in: Proceedings of the 4th ACM SIGCOMM Conference on Internet Measurement,

25

IMC ’04, ACM, New York, NY, USA, 2004, pp. 135–148. doi:10.1145/ 1028788.1028805. 4 [22] M. P. Collins, M. K. Reiter, Finding Peer-to-Peer File-Sharing Using Coarse Network Behaviors, Springer Berlin Heidelberg, Berlin, Heidelberg, 2006, pp. 1–17. doi:10.1007/11863908_1. 4 [23] Y. Xue, E. Y. Vasserman, Simple and compact flow fingerprinting robust to transit through low-latency anonymous networks, in: 2016 13th IEEE Annual Consumer Communications Networking Conference (CCNC), 2016, pp. 765– 773. doi:10.1109/CCNC.2016.7444875. 4 [24] D. Fifield, A. Geana, L. MartinGarcia, M. Morbitzer, J. Tygar, Remote operating system classification over ipv6, in: Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security, AISec ’15, ACM, New York, NY, USA, 2015, pp. 57–67. doi:10.1145/2808769.2808777. 4 [25] P. Matoušek, O. Ryšavý, M. Grégr, M. Vymlátil, Towards identification of operating systems from the internet traffic: Ipfix monitoring with fingerprinting and clustering, in: DCNET2014. Proceedings of the 5th International Conference on Data Communication Networking, SciTePress - Science and Technology Publications, 2014, pp. 21–27. 4 [26] M. Husak, M. Vermak, T. Jirsik, P. Veleda, Https traffic analysis and client identification using passive ssl/tls fingerprinting, EURASIP Journal on Information Security 2016 (1) (2016) 6. doi:10.1186/s13635-016-0030-7. 4 [27] C. Gatta, E. Puertas, O. Pujol, Multi-scale stacked sequential learning, Elsevier, Pattern Recognition. Volume 44, Issues 10–11, October–November 2011, Pages 2414-2426. doi:10.1016/j.patcog.2011.04.003. 2 [28] R. Upathilake, Y. Li, A. Matrawy, A classification of web browser fingerprinting techniques, in: 2015 7th International Conference on New Technologies, Mobility and Security (NTMS), 2015, pp. 1–5. doi:10.1109/NTMS. 2015.7266460. 4 [29] Y. Cao, S. Li, E. Wijmans, (cross-)browser fingerprinting via os and hardware level features, in: the Proceeding of Network and Distributed System Security Symposium (NDSS), 2017. 5 26

[30] O. Abdel-Hamid, A. r. Mohamed, H. Jiang, L. Deng, G. Penn and D. Yu, Convolutional Neural Networks for Speech Recognition, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22, 10, 2014, pp. 1533–1545. doi:10.1109/TASLP.2014.2339736. 2 [31] P. Fournier-Viger, J. C.-W. Lin, R. U. Kiran, Y. S. Koh, R. Thomas, A survey of sequential pattern mining, in: Data Science and Pattern Recognition (DSPR), Vol. 1(1), 2017, pp. 54–77. 5 R in [32] Y. Bengio, Learning deep architectures for ai, Foundations and Trends Machine Learning 2 (1) (2009) 1–127. doi:10.1561/2200000006. 5

[33] A. Graves, Practical variational inference for neural networks, in: J. ShaweTaylor, R. S. Zemel, P. L. Bartlett, F. Pereira, K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 24, Curran Associates, Inc., 2011, pp. 2348–2356. 5 [34] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional neural networks, in: F. Pereira, C. J. C. Burges, L. Bottou, K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 25, Curran Associates, Inc., 2012, pp. 1097–1105. 5 [35] K. He, X. Zhang, S. Ren, J. Sun, Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition, Springer International Publishing, Cham, 2014, pp. 346–361. doi:10.1007/978-3-319-10578-9_23. 5 [36] Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE 86 (11) (1998) 2278–2324. doi:10.1109/5.726791. 7 [37] R. Girshick, Fast r-cnn, in: The IEEE International Conference on Computer Vision (ICCV), 2015. 7 [38] G. Wang, G. Zhang, K. S. Choi and J. Lu, Deep Additive Least Squares Support Vector Machines for Classification With Model Transfer, IEEE Transactions on Systems, Man, and Cybernetics: Systems, PP, 99, 2017, pp. 1–14. doi:10.1109/TSMC.2017.2759090. 3

27

[39] K. Simonyan, A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition, Computing Research Repository volume: abs/1409.1556, 2014. 13 [40] R. Ptuchaa, F. Petroski Sucha, S. Pillaia, F. Brocklerb, V. Singhb, P. Hutkowskib, Intelligent character recognition using fully convolutional neural networks, Elsevier, Pattern Recognition. Volume 88, April 2019, Pages 604-613. doi:10.1016/j.patcog.2018.12.017. 2 [41] Y. Ting-Fang, H. Xin, F. Monrose, Michael K. Reiter, Browser Fingerprinting from Coarse Traffic Summaries: Techniques and Implications, Proceedings of the 6th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. April 2009, Pages 157-175. doi:10.1007/978-3-642-02918-9_10. 4, 9 [42] B. Chun, D. Culler, T. Roscoe, A. Bavier, L. Peterson, M. Wawrzoniak, M. Bowman, PlanetLab: An Overlay Testbed for Broad-coverage Services, SIGCOMM Comput. Commun. Rev. July 2003, Pages 3-12. doi:10.1145/ 956993.956995. 9 [43] F. Bellard, QEMU, a Fast and Portable Dynamic Translator, Proceedings of the Annual Conference on USENIX Annual Technical Conference. 2005, Pages 41-41. . 9 [44] Z. Jia, X. Cui, Q. Liu, X. Wang, C. Liu, Micro-Honeypot: Using Browser Fingerprinting to Track Attackers, 2018 IEEE Third International Conference on Data Science in Cyberspace (DSC). 2018, Pages 197-204. doi:10.1109/DSC.2018.00036. 2 [45] K. Tanabe, H. Ryohei, S. Takamichi, Combining Features in Browser Fingerprinting, Proceedings of the 13th International Conference on Broadband and Wireless Computing, Communication and Applications (BWCCA-2018). 2018, Pages 671-681. doi:10.1007/978-3-030-02613-4_60. 5 [46] J. Queiroz, E.L. Feitosa, A Web Browser Fingerprinting Method Based on the Web Audio API, 2019 The Computer Journal. ISSN: 0010-4620, Pages 1106-1120. doi:10.1093/comjnl/bxy146. 5

28

Conflict of Interest xjtu.edu.cn, Xi’an Jiaotong University, China. cert.org.cn, Nationaol Computer Network Emergency Response Technical Team/Coordination Center of China

29

Saeid Samizade obtained his BSc and MSc in Software Engineering and Artificial Intelligence (AI) from Qazvin Azad University (QIAU) in Iran. He is currently a Ph.D student in Xi'an Jiaotong University. His research interest is focused on the intersection of Security and state of the art Machine Learning (ML) models. Chao Shen (S’09-M’14) is currently a Professor in the School of Electronic and Information Engineering and Associate Dean in the School of Cyberspace Security, Xi’an Jiaotong University of China. His research interests include network security, insider detection, and behavioral biometrics. Chengxiang Si is currently a researcher with Nationaol Computer Network Emergency Response Technical Team/Coordination Center of China. His research interests include computer security and insider/intrusion detection.

Xiaohong Guan (S’89–M’93–SM’94–F’07) received the Ph.D. degree in electrical engineering from the University of Connecticut, Storrs, in 1993. Since 1995, he has been with the Systems Engineering Institute, Xi’an Jiaotong University, Xi’an, China, where he is also currently a Cheung Kong Professor of Systems Engineering and the Dean of School of Electronic and Information Engineering. He is also with the Department of Automation, Tsinghua National Laboratory for Information Science and Technology and the Center for Intelligent and Networked Systems, TNLIST, Tsinghua University, Beijing, China. His research interests include allocation and scheduling of complex networked resources, and network security.