Active learning approach to label network traffic datasets

Journal of Information Security and Applications 49 (2019) 102388 Contents lists available at ScienceDirect Journal of Information Security and Appl...

Download PDF

2MB Sizes 0 Downloads 88 Views

Report

PDF Reader
Full Text

Journal of Information Security and Applications 49 (2019) 102388

Contents lists available at ScienceDirect

Journal of Information Security and Applications journal homepage: www.elsevier.com/locate/jisa

Active learning approach to label network traﬃc datasets Jorge L. Guerra Torres a,∗, Carlos A. Catania b, Eduardo Veas c a

Institute for Information Technology and Communications, National University of Cuyo, Mendoza, Argentina LABSIN, School of Engineering, National University of Cuyo, Mendoza, Argentina c Institute of Interactive Systems and Data Science, Graz University of Technology, Graz, Austria b

a r t i c l e

i n f o

Article history:

Keywords: Active learning Labeling network Random Forest Learning rate Noise robustness

a b s t r a c t In the ﬁeld of network security, the process of labeling a network traﬃc dataset is specially expensive since expert knowledge is required to perform the annotations. With the aid of visual analytic applications such as RiskID, the effort of labeling network traﬃc is considerable reduced. However, since the label assignment still requires an expert pondering several factors, the annotation process remains a difﬁcult task. The present article introduces a novel active learning strategy for building a random forest model based on user previously-labeled connections. The resulting model provides to the user an estimation of the probability of the remaining unlabeled connections helping him in the traﬃc annotation task. The article describes the active learning strategy, the interfaces with the RiskID system, the algorithms used to predict botnet behavior, and a proposed evaluation framework. The evaluation framework includes studies to assess not only the prediction performance of the active learning strategy but also the learning rate and resilience against noise as well as the improvements on other well known labeling strategies. The framework represents a complete methodology for evaluating the performance of any active learning solution. The evaluation results showed proposed approach is a signiﬁcant improvement over previous labeling strategies. © 2019 Elsevier Ltd. All rights reserved.

1. Introduction This paper describes an intelligent tool to aid the network security expert in the task of labeling network data. Computer networks have become indispensable for exchanging information among people and organizations; therefore, security is a major challenge nowadays. Beyond user authentication, data encryption and ﬁrewalls, network intrusion detection systems (NIDS) are widely used as an active defense for the network environment. An NIDS is an active process that monitors network traﬃc to identify security breaches (e.g., Botnet behavior) and initiate countermeasures. NIDSs require a way to adapt to a fast changing environment or they risk becoming obsolete. Intelligence-based detection systems deal with the fast evolution of network scenarios using machine learning techniques [1]. Just before deploying it in any real world environment, an intelligence-based NIDS must be trained and evaluated using real labeled network traﬃc traces with an extensive set of intrusions or attacks [2]. Hereby, one of the most signiﬁcant issues during the development of intelligence-based detection systems is the lack of appro-

∗

Corresponding author. E-mail address: [email protected] (J.L.G. Torres).

https://doi.org/10.1016/j.jisa.2019.102388 2214-2126/© 2019 Elsevier Ltd. All rights reserved.

priate public datasets [3]. This issue is originated from two major challenges: (i) network data contains sensitive information that organizations and individuals are not willing to disclose, (ii) labeling all published data requires a major human effort, which can only be carried out by highly trained experts: security specialists. As regards the sensitivity of network data (i), clearly there are high risks of disclosing private or classiﬁed information, whereby researchers frequently encounter insurmountable organizational and legal barriers when they attempt to provide datasets to the community [3]. The Stratosphere Intrusion Prevention System (IPS) [4] project comes as a response to the challenge of releasing network data without revealing sensitive information. The project aims to generate high-quality datasets, using a particular encoding of network behavior, for testing and developing new malware detection techniques. To address the human effort in labeling task (ii) several techniques are employed, ranging from the automatic generation of labels [2,5–8], semi-supervised labeling approach [9,10] and to the use of visual tools for the analysis of network traﬃc [11,12]. Despite of all the attempts, the most commonly used datasets used for evaluation are almost 12 years old, which makes them practically obsolete if we consider the fast evolution of the network security ﬁeld [1]. Our contribution addresses the second challenge (ii) by building an intelligence visual analytics application (blending strategies)

2

J.L.G. Torres, C.A. Catania and E. Veas / Journal of Information Security and Applications 49 (2019) 102388

– RiskID – to assist the labeling of network traﬃc datasets. RiskID builds on the Stratosphere IPS encoding to ensure anonymity of network labeled data. RiskID trains a classiﬁer on the subset of already labeled connections and uses classiﬁer output to help the user in the label decision process. In particular, RiskID combines visualization strategies and active learning, working together to facilitate the recognition of malicious traﬃc. As a result, RiskID proposes to promote the creation of properly labeled public network traﬃc datasets, which are so useful for the scientiﬁc community. Besides issues speciﬁc to network security, the use of active learning raises other challenges: (a) which algorithms are suitable for online incremental learning in the domain (b) how should the performance be evaluated. This paper builds two major contributions in the area of intelligence assisted network security: 1. An active learning method using Random Forests to interactively assist the user in the labeling process. 2. The evaluation procedure needed to validate any active learning solutions in real environments. We present a thorough evaluation framework to validate: learning rate, prediction performance, resilience against noise and impact on overall performance. 2. Related work The lack of labeled public dataset in network security environment is a well-known problem and has been tackled considering different aspects. Synthetic datasets were created for representing certain problem domains, speciﬁc needs or conditions. Examples of known synthetic datasets are: KDDcup99 [5], built upon the data captured in the DARPA98 IDS evaluation program, DEFCON [6], that contains network traﬃc captured during a hacker competition called “Capture The Flag”, CAIDA dataset [7] that contains particular kind of attacks, among others. Synthetic datasets are often very useful but suffer from excessive preprocessing that separates them from real network environments. Real-life datasets. In an attempt to obtain real-life datasets, Bhuyan et al. [2] proposes a systematic approach for generating automatically real-life network intrusion dataset at both packet and ﬂow level traﬃc information. Another example of automatic label generations is proposed by Pius et al. [13] who applied cluster techniques to annotate unlabeled multivariate sensor data in smart-phone networks. Mukkavilli’s et al. [8] used a similar approach. Their systematic approach is built upon an experimental platform used to represent the practical interaction between cloud users and cloud services. Hereby, they collects traces of network traﬃc as a result of the interaction between users and their cloud services, obtaining a real labeled dataset. These network traces from the cloud are readily shareable and can be interchanged among collaborators and researchers without major privacy issues. Clearly, if the researchers have control over network conditions, or if the network traﬃc is artiﬁcially generated using network simulation software, the process of labeling is simpliﬁed. However, obtaining such control of the network environment is not always possible. Moreover, even in controlled networks, assuring that the training datasets are correctly labeled or completely free of noise information is extremely hard due to the conﬁdence in the control of the network. In addition, the injected attack traﬃc and the background traﬃc come from different packet captures, it may be easier to identify the attack traﬃc if appropriate care was not taken when merging the captures [14]. Helping the user labeling process. Human experts are essential for annotating network traﬃc but they are an expensive resource. Therefore, the labeling process should use expert time eﬃciently. Consequently, to reduce human effort in the labeling process it is

common to ﬁnd two main approaches: (i) semi-automatic learning strategies and (ii) visual applications. (i) In the work of Aparicio et al. [10], the authors proposed an approach to automatically generate labeled network traﬃc datasets using an unsupervised anomaly based IDS. The resulting labeled dataset was then processed using a Genetic Algorithm (GA) for selecting the main features. Other works focus on active learning to build a labeled dataset for intrusion detection. Active learning is an interactive process where a user interface is required for the expert to annotate. For instance, the Aladin project [15] applies rare category detection [16] on top of active learning to foster the discovery of the different families, and Gornitz et al. [9] use a k-nearest neighbor approach to detect yet unknown malicious connections. Gornitz et al. have only run simulations on fully labeled datasets with an oracle answering the annotation queries and they have not mentioned any user interface to interact with users. On the other hand, the Aladin project has a corresponding graphical user interface, but the authors provide no detail about it. (ii) In another attempt to improve the process of manually labeling network connections, Soule’s et al. [12] propose a web-based software system that allows the users to share, label, and inspect traﬃc time-series. This tool analyzes raw network traﬃc and despite the visual tools that accompany collaborative tagging, the absence of supervised or semisupervised tools does the process of labeling a large dataset remains an arduous task. Beaugnon et al. [11] propose ILAB, a labeling strategy based on continuous interaction with the expert mixing the two approaches: user interface and active learning. With a user interface, the expert is asked to annotate some instances from a large unlabeled pool to improve the current detection model and the relevance of the future annotation queries. The work of Beaugnon et al. is closest to the approach discussed in the present article, for this reason we established a comparative with this work in Section 5.6. Our contribution differs from these previous works in that we combine visualization and learning in an intelligent visual analytics application, (RiskID). In ILAB, the user interface is rather rudimentary, showing only features: start time, duration, source and destination ip and port, number of bytes and packages [11]. Our application describes ﬂows as a feature vector basing on the Stratosphere IPS encoding which offers information about periodicity, duration and package size. The visualization in RiskID offers distributions of such features and a similarity computation grouping them according to the feature vector. An important aspect to remark is that these strategies rely on correct labeling by the expert, so the quality of these labels inﬂuences the subsequent automatic labeling. Beaugnon et al. propose a new sampling strategy and compare it with other active learning approaches in terms of sampling bias and execution time. Instead, we present a comprehensive evaluation framework to validate: learning rate, prediction performance, resilience against noise and the impact on overall performance. 3. The RiskID application RiskID is a visual analytics tool that combines visualization with statistical techniques to assist the user in the process of labeling network connections [17]. The application organizes an overview and detail views of the network behavior for facilitating the exploration and ﬁnding of possible threats. Fig. 1 illustrates the architecture of RiskID and its three main modules. The Back-end is made of a Preprocessing Module, and an Analytics Module. The process starts with a raw network traﬃc

J.L.G. Torres, C.A. Catania and E. Veas / Journal of Information Security and Applications 49 (2019) 102388

3

Fig. 1. RiskD: Module interaction diagram. Table 1 Symbol assignment strategy to encode network behavior. Size

Small

Duration

Short

Med

Long

Short

Medium Med

Long

Short

Large Med

Long

Strong Per Weak Per. Weak Non-Per. Strong Non-Per No Data

a A r R 1

b B s S 2

c C t T 3

d D u U 4

e E v V 5

f F w W 6

g G x X 7

h H y Y 8

i I z Z 9

Time between 0–5s= . 5–60s=, 60s – 5m=+ 5m – 1h=∗ Timeout 1h = 0

dataset, usually in pcap (packet capture) format. The Preprocessing Module transforms a raw network traﬃc dataset to internal format –a 10-dimensional feature vector– and passes it to the Analytics Module. To help users during labeling, the Analytics Module applies several statistical methods with the goal to group items in the vector space. In the front end, the Visual Analytics Module receives the feature vectors, statistics and grouping information and organizes it in overview and details views following the Visual InformationSeeking Mantra [18]. 3.1. Preprocessing module The Preprocessing Module performs two conversion processes, each inside a speciﬁc submodule: the Network Pattern Extractor and the Feature Extractor submodules. The former takes care of anonymization and the later of feature generation. 3.1.1. Network pattern extractor submodule The Network Pattern Extractor Submodule implements the encoding proposed by the Stratosphere IPS project [4]. Such encoding is performed with two purposes: to reduce the usually considerable size of the network traﬃc data, and to guarantee data anonymity during the labeling process. The Stratosphere IPS encoding aggregates network ﬂows according to a 4-tuple composed of: the source IP address, the destination IP address, the destination port and the protocol. For each ﬂow, the encoding considers the information about size, duration and the periodicity of packet exchange. It uses a character encoding as follows: each letter in the connection deﬁne a 3-tuple with the characteristics < periodicity, duration, size > , each number indicates that there is not enough data to create a 3-tuple yet (it is normal to have numbers at the beginning of each SC). Finally, between each number and letters a symbol indicates the time elapsed between each ﬂow [4]. Table 1 shows the symbol assignment strategy for encoding the network behavior according to Stratosphere IPS.

Fig. 2. An example behavioral encoding of connection from IP address 147.32.84.165 to destination port 53 at IP address 147.32.80.9 using UDP.

All network ﬂows aggregated under a single tuple are referred as a single Stratosphere connection (SC). In other words, a single SC represents the temporal behavior from one IP address to a speciﬁc service running on a speciﬁc IP address. Several of SCs can be created from a raw network traﬃc dataset in pcap format. All of this methodology represents the anonymization technique that the Network Pattern Extractor Submodule uses to protect information from network traﬃc. A sample of the Stratosphere IPS behavioral encoding is shown in Fig. 2. The ﬁgure shows the symbols representing all the ﬂows for a SC based on UDP protocol from IP address 147.32.84.165 to port 53 of IP address 147.32.80.9. In this case, the SC is represented by 24 ﬂows (count of characters between numbers and letters). Note that most traﬃc was not periodic with long intervals between 5 min and 1 h (most presence of letters z/Z and s/S) and only some ﬂows were periodic (two occurrences of letter B and one I). In this example, all the ﬂows were between medium and long duration and mostly large size. We reference this connection example in the following sections as c-655 because this is the index that it has in our connection list. 3.1.2. Feature vector extractor submodule The Feature Vector Extractor Submodule is responsible for generating an even more condensed representation of the network trafﬁc dataset. The Feature Vector Extractor Submodule summarizes a SC into a 10-dimensional numerical vector denoted as feature vector: < xsp , xwp , xwnp , xsnp , xds , xdm , xdl , xss , xsm , xsl > . The ﬁrst four dimensions of the numerical vector represent the periodicity feature (strong

4

J.L.G. Torres, C.A. Catania and E. Veas / Journal of Information Security and Applications 49 (2019) 102388

Fig. 3. Visual representations in RiskID application.

periodicity (sp), weak periodicity (wp), weak non periodicity (wnp) and strong non periodicity (snp) respectively), the other three refer to duration feature (duration short (ds), duration medium (dm) and duration large (dl) respectively) and the last three represent the size feature (size short (ss), size medium (sm), size large (sl)). The feature vector for a given connection is generated considering, for the complete symbol sequence, the cumulative frequency of the corresponding values associated with the behavioral encoding. At the end of the sequence, a percent of each feature is calculated and normalized between the values [0,1]. Formally, each xj where j ∈ {sp, wp, wnp, snp, ds, dm, dl, ss, sm, sl} it is deﬁned as:

xj =

N 1 I (ti ∈ S j ) N i=1

Where N is the count of symbols that makes up the SC, ti the ith symbol in the SC and Sj the set of characters that represents the j feature in the whole connection behavioral encoding. Finally, I(.) is the indicator function. As an example, the feature vector resultant for the connection c-655 is: < sp: 0, wp: 0.13, wnp: 0.21, snp: 0.58, ds: 0, dm: 0.25, dl: 0.66, ss: 0.25, sm: 0, sl: 0.66 > Notice that after performing the transformation, the resulting feature vector will provide a similar information level about a given SC, except for the temporal behavior. (i.e. historical information about the network ﬂows). 3.2. Analytics module The Analytics Module analyzes the 10-dimensional feature vectors and group them according to standard similarity measures.

A ﬁrst grouping strategy is based on clustering. Clustering is implemented using a k-means algorithm based on L2 distance to form the groups. The optimal number of groups are selected by the Elbow method [19], which consists of increasing the number of clusters until the marginal gain of the variance explained by the model is negligible. The advantage of this technique lies in the interaction with the visual components. The clustering approach is meant to offer a ﬁrst visual approximation of similarity between SCs according to their feature vectors. In Fig. 3(b) the block on the left shows the beneﬁts of the grouping strategy and the visual components (connections close to c-655 belong to the same cluster). A second grouping strategy is implemented considering the similarities between all the SCs in the dataset. The Analytics Module implements a similarity matrix by iterating over each SC in the dataset and ranking the remaining SCs according to the cosine distance function, much like an item-based recommender system. In this way, once a connection is selected from the list the remainder connections are arranged by their similarity with the connection selected. This functionality improves the detection of sets of connections with similar features. 3.3. Visual analytics module The Visual Analytics (VA) Module presents the information obtained from the preprocessing module in a set of graphical widgets, using the information from the analytics module to enrich and organize them. The features extracted by the Pre-processing module together with the grouping strategies from the Analytics module form the basis for the visual elements.

J.L.G. Torres, C.A. Catania and E. Veas / Journal of Information Security and Applications 49 (2019) 102388

Following the example of connection c-655 the VA Module creates a visual representation from its feature vector Fig. 3a. Fig. 3 illustrates the different visual elements in the VA Module. An overview display shows a vertical heatmap of all the feature vectors with different colors for periodicity, size and duration features. The vectors are organized after the cluster grouping. The intention is to show, from a pure statistical view, which connections belong together, so the expert can see how they are labeled. Upon picking an unlabeled connection (e.g. the connection of Fig. 2 c-655), it is moved to the top of the list and a detailed view is opened (see Fig 3b). Then, the list is reorganized by similarity, and two connections are moved up: the most similar connection labeled as Botnet and the most similar connection labeled as Normal. This feature is intended so the expert can compare connections by their features. The most similar Bot and Normal connections are automatically brought to the detailed view for comparison. The detailed view also shows origin and destination addresses, port and protocol. By clicking on the widgets the user can ﬁlter the overview list after them, in order to, for example, ﬁnd connections originating in the same IP. 3.4. User labeling strategy A formative evaluation was carried out to observe the workﬂow and decisions taken while looking for undesired behavior in network logs. The evaluation used a dataset derived from three previously labeled datasets publicly available as part of the Malware Capture Facility Project (MCFP) [20]. Two experts participated in the study. Their task was to label connections using an earlier version of RiskID, which didn’t include label prediction or similarity re-ordering. The study helped us identify a labeling strategy based on ﬁltering and multiple comparisons. The labeling work-ﬂow consisted of, selecting an unlabeled connection and comparing it with labeled connections that share similar characteristics [17]. After analyzing the labels of similar connections (e.g., using RiskID cosine similarity visualization), the unlabeled connection is assigned to the majority class. The process can be repeated until the complete dataset is labeled. It is important to mention, that the aforementioned workﬂow requires a portion of the dataset be previously labeled. Initial labels could be assigned following a traditional approach based on blacklist IP addresses or services, and analysing the SC periodicity to determine whether ﬂows occurring at periodic intervals are observed in the connection. The above user work-ﬂow falls within the scope of semisupervised learning, where it is clear that, the more correctly labeled connections, the higher the probability that the remaining unlabeled connections will be correctly labeled. So, the quality of the labels in the dataset depends: (i) the number of labeled connections, (ii) the level of correctness of the labels. Therefore, based on user semi-supervised workﬂow, we propose to include active learning intelligence to suggest labels for unlabeled connections to the user and this way help in the labeling process. 4. The active learning strategy This section details the active learning strategy used to predict labels in close to real-time, based on the previous decisions of the expert user. The goal behind the active learning strategy is to use behavior information of connections previously labeled to estimate the label probabilities for connections not labeled yet. Hereby, the intention is to help the user by providing a tool for early ﬁnding, in the haystack of SC connections, those unlabeled connections that can potentially be labeled based on behavior informa-

5

Fig. 4. Learning and prediction process iterations by percent of labeled connections.

tion learnt from available labels. As usual with intelligent systems carrying out tasks autonomously for users, it becomes necessary to give the user some support or evidence about why the algorithm is suggesting either action. This active learning support strategy faces several challenging requirements: R1: It must ﬁt perfectly into the work-ﬂow of the application, which means that the new autonomous learning process should coexist with the user’s labeling strategies. R2: It should face the shortage of initial data and predict Botnet probability with acceptable accuracy. R3: It must be capable of dealing with some noise level (i.e. wrongly labeled connections) in the learning process without changing the course of the expected results. R4: It should be an improvement over the previous user labeling strategies. More speciﬁcally, it should reduce the time of labeling while improving label accuracy. R5: It must provide the user some evidence to raise conﬁdence in the algorithm proposed decision. Of these requirements, compliance with R1–R4 can be validated experimentally with the evaluation framework. R5 is as much an algorithm characteristic as a design problem and has to be addressed at design / implementation time. 4.1. Prediction module The proposed Active Learning Strategy is included in a new Prediction Submodule that we introduce into the Analytics Module shown in Fig. 1. It performs model learning and prediction tasks, and requires a minimal set of labeled connections to that end. Such ﬁrst labeling process can be done following the strategies mentioned in Section 3.4. The Prediction Module (PM) monitors the number of labeled connections. If the number of labels rises above the two percent, the PM initiates an autonomous process for learning behavior associated to connections using the available labels (see Fig. 4). The process is carried out in the background and does not affect the user’s interaction with the application (R1). After a learning cycle, the PM will include the resulting model to predict the Botnet class probability for each unlabeled connection. All unlabeled connections with a probability higher than 0.5 will be indicated as Botnet while those below or equal to 0.5 will be indicated as Normal. Then we instruct the user to label those unlabeled connections with a probability very close to the decision boundary. This strategy, called Uncertainty Sampling [21] guarantees that the most dubious connections are the ﬁrst to be tagged by the user and thus help the prediction model. This procedure is repeated when the number of connection labeled increases by 2%.

6

J.L.G. Torres, C.A. Catania and E. Veas / Journal of Information Security and Applications 49 (2019) 102388

Fig. 5. Prediction bar and conﬁdence level added in connection list view after each learning and prediction process.

As a basic means of evidence for the prediction (R5), the PM outputs a Support Level (SL) for each prediction. The SL of predicted label refers to the percentage of connections with a similar destination port within the set of labeled connections used for building the prediction module:

SL(sc p ) =

|sc pt | |sc pd |

Where scp mean a SC with port p, scpt mean the set of connection with p port inside the set of labeled connections and scpd the set of connection with p port in whole dataset. 4.2. Visual description After learning and prediction cycles have ended, an alert notiﬁes the user about new label recommendations. Fig. 5 illustrates the label recommendation bar that appears next to an unlabeled connection (c-655, in the example). Each unlabeled connection receives a prediction bar with the red color indicating the percentage of probability of Botnet. Green color indicates the percent of probability of Normal. Next to the bar, a numerical value indicates the SL of that prediction. This minimalist visual cue aims to make it easy to compare predictions over several SCs and decide which to pick next. 5. Evaluation framework For an evaluation to be useful, one must consider its purpose and scope, select the appropriate metrics and correctly apply assessment techniques. According to the classiﬁcation given by Staheli et al. [22] for the commonly-used techniques for evaluating visualization the most common evaluations are Usability Testing, Simulation and Performing Testing. We present an evaluation framework to analyze application performance using one kind of Simulation and the Application Performance Testing technique, leaving the evaluation with users for a later work. The PM is evaluated through a set of experiments aiming at validating performance according to aforementioned requirements. First, a preliminary study is carried out using traditional k-fold cross-validation to analyze the viability of a conventional machine learning algorithm in predicting Botnet connections. Second, we evaluate the behavior of the PM considering its integration with the RiskID application work-ﬂow. For this particular case, evaluation of learning rate and noise tolerance are considered. Thereafter, the PM is compared with the current state-of-the-art system, ILAB [11]. In the following subsections, we describe the dataset preparation along with the selected metrics used in the proposed experiments and discuss the results. 5.1. Dataset description The evaluation uses a total of 22 datasets divided in two groups of data called CTU-13 and CTU-19. All data were captured in the CTU University, Czech Republic, over the period of 2011 and 2017 and are publicly available as part of the Malware Capture Facility Project (MCFP) [23]. The CTU-13 [24] group of data consists for thirteen datasets (called scenarios) of different botnet samples, normal and background traﬃc captured in 2011. Speciﬁc malware were executed on each scenario. Malware includes several protocol and performed

different actions such as SPAM, DDos and Click Fraud among others. In total, this group of datasets has 9241 connections with 6394 connections labeled as “Botnet” and 2847 labeled as “Normal”. The CTU-19 group of data consists for nineteen datasets (called scenarios) of different botnet samples. Speciﬁcally three botnet captures: 2013-08-20 capture-win15 [25], 2013-10-01 capturewin12 [26], 2013-10-01 capture-win8 [26] and normal traﬃc including DNS, HTTPS and P2P [27]. In total all captures represents 24,227 connections with 15,737 connections labeled as “Botnet” and 8490 labeled as “Normal”. All these captures were performed between 2013 and 2017. Fig 6 shows a summary of the class distribution in CTU-13 Fig. 6a and CTU-19 Fig. 6b by type of connections. The X-axis shows the type of connections represented by the most representative ports in the dataset and Y-axis the distribution between the classes Botnet and Normal for each. It is worth noting that CTU-13 presents more variety in types of connections than CTU-19. In CTU13 a large number of connections comes from port 25 (SMTP connections) and 80 (HTTP connections). However, the CTU-19 has a similar distribution between port 25, 53 (DNS connections), 80 and 443 (HTTPS connections). In both group of datasets all the connections coming from port 25 have been labeled as Botnet and traﬃc coming from HTTP/HTTPS (80/443 port) is mostly normal. 5.2. Metrics Several standard metrics for network detection evaluation were used for evaluating the performance of the PM. These metrics correspond to True Positive Rate (TPR) and False Positive Rate (FPR). TPR is computed as the ratio between the number of correctly detected malicious connections (True Positive) and the total number of malicious connections. Whereas FPR is computed as the ratio between the number of normal connections that are incorrectly classiﬁed as malicious (False Positive) and the total number of normal connections. Some other metrics are used for dealing with class imbalance: F1-Score and the Receiver Operating Characteristic (ROC) curve. F1-Score is computed as the weighted average between TPR and the total numbers of malicious connections in the dataset. ROC curve consists of a simple plot between TPR and FPR considering different models. The ROC curve can be reduced to a simple scalar by calculating the Area Under the Curve (AUC). Finally, the Equalized Lost of Accuracy (ELA) metric is used to evaluate the model robustness in terms of noise tolerance. The ELA metric computes the loss of accuracy with respect to the case without noise [28]. ELA for an x% noise level is calculated as A0% +Ax% 100+A + A 0% , where A0% is the accuracy of the classiﬁer with A 0%

0%

a noise level 0%, and Ax% is the accuracy of the classiﬁer with a noise level x%. 5.3. Prediction algorithm Under current implementation of RiskID, a Random Forest (RF) prediction algorithm was included in the PM. The inclusion of RF inside the PM responds to its parallelization capability. The bagging process implemented by RF makes it suitable to execute in distributed environments. This parallelization capability is a key feature for improving the usability of RiskID when large dataset needs to be labeled, a common situation in the network security research ﬁeld. In addition, RF is a solution commonly used in unbalanced data situations [29,30], a condition commonly given when labeling network traﬃc datasets. The RF algorithm consists of a collection of tree-structured classiﬁers. Each tree grows with respect to a random vector k , where k , k = 1, . . . , L, is independent and identically distributed. Each

J.L.G. Torres, C.A. Catania and E. Veas / Journal of Information Security and Applications 49 (2019) 102388

7

Fig. 6. Distribution of classes in CTU-13 and CTU-19 from the perspective of the connection type.

Fig. 7. Random Forest architecture with multiple decisions trees and output class the majority vote.

tree casts a unit vote for the most popular class at input x [31]. Fig. 7 shows an example of the discussed RF implementation. Random Input Selection (RIS) is used to generate the different trees [32]. Hereby, the algorithm chooses randomly a subset S with M features from the original set of n features and seeks within S the best feature to split the node. A feature subset is selected for each node with a value of M is M = log2 n + 1, where n is the total number of features [31]. We carry out a preliminary RF test, which although not entirely adequate for the real operation of RiskID, gives us a general idea of how RF works in these datasets. We evaluate the performance of RF in terms of Accuracy, FPR, TPR, AUC and F1-Score. For CTU-13 and CTU-19, the 70% of the original dataset were used for training the models and the remaining 30% for testing. To deal with class unbalanced situations an Up-Sampled technique was applied over the training set. The training model process was performed used

a k-fold cross-validation (10-fold) over the training datasets. Then the untried testing dataset was used for evaluated each model. The whole process guarantees the independence of the results. Fig. 8 a presents a summary of metric values resultant of the best RF model and Fig. 8b presents the prediction performance of the RF model discriminated by type of connection. Both graphs are the results of testing RF with the two groups of data (CTU-13 and CTU-19) and obtaining their mean values. The second ﬁgure shows the distribution of connections correctly predicted (blue color bar) and incorrectly predicted (red color bar). The resulting RF model was able to correctly predict all SMTP connections (port 25). Note that there is a high percent of SMTP connections in CTU-13 and CTU-19 and the complete SMTP traﬃc was labeled as Botnet in both groups. On the other hand, for HTTP connections (port 80), RF showed some issues for predicting all considered cases. Such results could be explained by the imbalance situation observed in

8

J.L.G. Torres, C.A. Catania and E. Veas / Journal of Information Security and Applications 49 (2019) 102388

Fig. 8. Prediction performance of Random Forest model.

of HTTP labels. In the CTU-13, the majority of HTTP traﬃc was labeled as “Normal” (just about 25% Botnet). There were some differences in the predictions of other types of connections, but these represented a minority portion in the data set. These results show that RF is a viable candidate for inclusion in the RiskID predictor module. Hereafter, we only consider RF and perform an extensive evaluation to satisfy R1–R4. 5.4. Learning rate analysis The PM has to deal with the problem of assigning probabilities to unlabeled connections when only a small portion of the dataset is labeled (R2). Presumably, under scenarios when not enough information is available, the estimated probabilities will not be reliable. A similar situation is observed in recommender systems, when it necessary to recommend to a recent user who has no previous history with the system (cold start). Therefore, it is important to determine the amount of labeled connections required to provide reliable information to the user. The learning rate can be deﬁned as the speed at which the PM learns new information and consequently updates label probabilities. A system with high learning rate will be able to adapt to new labels to provide correct predictions within a short period of time [33]. The learning rate is calculated by training the RF model with different sized portions of the training dataset and then evaluating the performance of each model on the testing portion. The experimental procedure started with a random sample of 200 connections for training. These 200 connections were randomly selected. Such selection pretend to simulate the ﬁrst connections labeled by the user inside RiskID. As previously exposed in Section 3.1, the PM in RiskID will start when 2% of dataset is labeled, then this 200 initial connections represent a simulation of this 2% of labeled connections in CTU-13 dataset. In case of CTU-19 dataset 200 connections represent less than 2%, despite of that, we perform the analysis with a similar data distribution. After the ﬁrst training of the model, each iteration increases the amount of data in the training set implementing an Uncertainty Sampling query selection [21]. In this way the size of the training set was increased by the connection instances about which it is least certain how to label (connections in training closer of 0.5 probability of botnet class). This procedure was repeated until all connections in the training were

used. The F1-Score was used to evaluate the performance of RF model at each iteration. Each experimental scenario was simulated 30 times (i.e. 44 different training sets x 30 = 1320 simulations in total) to ensure the statistical robustness of results. The resulting curve is plotted in Fig. 9. The X-axis refers to the size of the training set used for building the RF model, while the Y-axis refers to the mean of the F1 score over the 30 repetitions. In the ﬁrst scenario (200 connections), the PM showed a mean F1 score test close to 0.93 using the CTU-19 and close to 0.89 using the CTU-13. The F1 score increased for the remaining 25 increments in the training set where it reaches a F1score value close to 0.96 using the CTU-19 and close to 0.92 using the CTU-13. After that point the F1-score does not show a signiﬁcant improvement for any of both group of data. Note that for both data sets the ﬁrst instance test show good results despite the few data used to train the model. Arguably, the learning rate of the RF classiﬁer can vary for different types of connections. Such differences are not only caused by the initial disproportion in numbers between connection types, but also by the network traﬃc variability associated to each connection type. Fig. 10 shows the prediction performance by connection type. In particular, the Figures show results when the model was generated with 2, 50 and 90 percent of labeled connections. These percentage values were selected to represent the initial, middle and ﬁnal phase, respectively, of the labeling process. Even with only 2 percent of labeled data (Fig. 10a), the RF correctly detected 100 percent of SMTP connections (port 25). Such behavior can possibly be explained by the small traﬃc variability of SMTP traﬃc present in the dataset. In other words, since all STMP traﬃc is similar, the model just needed a few samples to correctly classify them. For ports such as 80 (HTTP), 53 (DNS) and 123 (NTP) the RF model failed to detect between the 21% to 31% of the connections. In such cases, given the high variability observed in such ports, it was necessary to label a higher number connections. Figs. 10b and 10c show how the number of errors is considerable reduced as the number of labeled connections increases. 5.5. Robustness analysis It is widely known that a classiﬁer performance will be inﬂuenced by the quality of the labeled data used. Since the PM builds

J.L.G. Torres, C.A. Catania and E. Veas / Journal of Information Security and Applications 49 (2019) 102388

9

Fig. 9. Random Forest performance with incremental training data.

Fig. 10. Prediction performance by port considering different amounts of labeled connections in the dataset.

Fig. 11. ELA measure with incremental noisy data in training set.

the RF model with connections labeled according to user opinions, the quality of the labels will impact directly in the ﬁnal prediction. Here we analyze the inﬂuence of wrongly labeled connections in the performance of the generated RF model. In other words, we want to analyze the RF model tolerance to noise (R3). Clearly, a model with lower tolerance to noise is not suitable for it will also suggest noise. The robustness analysis was carried out by inserting an incremental noise level in the complete training datasets. The noise level was raised from 2 to 90 in 2 percent steps. The noise level x% on the existing data was controlled by randomly changing to the opposite class label of exactly x% of the samples. In each 2 percent step, the noise level was increased and the performance of PM was calculated considering a training set with 70 percent of the dataset and tested on the remaining 30 percent. The previous procedure was repeated 30 times for each step to ensure statistical signiﬁcance of the results. Results are shown in Fig. 11 in terms of the ELA measure [28]. The X-axis represents the noise level in the training set. The Y-axis is the averaged ELA measure over the 30 repetitions for each step.

As expected, Fig. 11 shows that ELA increases according to the noise level present in the training set with similar performance for both CTU-13 and CTU-19 datasets. With noise levels between 35% to 60% percent the steepness of the curve becomes signiﬁcant. However, ELA values remain under 0.15 for datasets with noise levels close to the 30%. Such moderate increment indicates that RF is robust to noise and reinforces its inclusion in RiskID. 5.6. Comparison with ILAB strategy In this section we compare the labeling result of two strategies that could be included inside the PM: RF model using Uncertainty Sampling and ILAB strategy proposed by Beaugnon et al. [11]. ILAB implements an Active Learning technique through a Logistic Regression [34] model following at each step a query selection known as Rare Category [16]. Unlike Uncertainty Sampling, Rare Category detection is applied on the instances that are more likely to be Malicious (Botnet) and Benign (Normal) (according to the detection model) separately. Not all connections are present in the initial pool of labeled dataset and rare category detection fosters

10

J.L.G. Torres, C.A. Catania and E. Veas / Journal of Information Security and Applications 49 (2019) 102388

Fig. 12. ILAB and Random Forest performance with incremental training data.

the discovery of yet unknown group of connections to avoid sampling bias. We replicate the ILAB implementation and to compare the performance of both strategies (RF using Uncertainty Sampling and ILAB) following the same methodology of the previous studies (Learning Rate and Robustness Analysis) over the same groups of data (CTU-13 and CTU-19). First, we compare how each strategy deal with the problem of assigning probabilities to unlabeled connections when only a small portion of the dataset is labeled, speciﬁcally a learning rate analysis (Fig. 12). Then, we compare the inﬂuence of wrongly labeled connections in the performance of each strategy Fig. 13. Learning Rate. Fig. 12a shows the results of Learning Rate study in RF using Uncertainty Sampling and ILAB strategy using the CTU13. Note that the performance of ILAB strategy presents little variation as the number of elements in the training set increases. Initially, training with only a 2% of the element in the training pool ILAB shows an F1 score close to 0.87 and ends with a value around to 0.88 of F1 score when the model was trained with the whole training set. A similar short variability is obtained when testing ILAB strategy over the CTU-19 (Fig. 12b). In this case the ILAB model starts predicting connections with a value of F1 score under 0.93 and ends with a value close to 0.94. On the other hand, the RF strategy presents greater variation as the training set increases. As illustrated in Section 5.4, the learning rate performance of the RF strategy increases proportionally with the amount of data in the training set. Our strategy achieves 0.92 and 0.95 F1 score values when the complete trainset sets (CTU-13 and CTU-19 respectively) were used to build the models. Clearly as can be seen in both ﬁgures our strategy as the training package increases gets better results over the ILAB strategy. Robustness Analysis. Fig. 13 represents the inﬂuence of wrongly labeled connections in the performance of RF model and ILAB strategies for both group of data. As we expected the ELA value increases according to the noise level for all cases. However, the results obtained using CTU-13 (Fig. 13a) show a difference between

the two strategies. In this case RF has a better noise tolerance than ILAB until a noise level close to 60% (note that the RF curve is below the curve obtained with the ILAB strategy). After the 60% of noisy data in the training set, RF strategy loses performance faster (the ELA value tends to one more quickly). On the other hand, the results obtained using the CTU-19 are very similar in RF and ILAB. Note that both strategies have a similar ELA value in the ﬁrst 30% of the noise level. After 30% of the noise level, RF based strategy starts to increase faster, but for high values up to 50% of the noise, ILAB starts to tend to one more quickly. 5.7. Impact on overall performance The present experiment aims at evaluating the beneﬁts provided by the PM compared with the common labeling strategy described in Section 3.4 (R4), hereafter referred as Simple Comparative Strategy (SCS). To this end, we simulate a user following the SCS strategy. From an algorithmic perspective, the SCS can be implemented as follows: (1) Select ﬁrst unlabeled connection from RiskID. (2) Move Selected connection to the top of the list. (3) Reorder remaining connections by their similarity (cosine similarity) with the selected connection. (4) Pick an odd number of labeled connections from the top of the non-selected list and selects the majority label. The performance in terms of F1 score for SCS was evaluated following the methodology described in Section 5.4 Fig. 14 exhibits the Learning Rate for SCS. Each point the X-axis indicates the amount of labeled connections in the dataset while the Y-axis refers to the average F1 score for each iteration. Despite its simplicity, SCS achieved a F1 Score about 0.89 with only 200 labeled connections (approximately 2 percent of the whole CTU13 dataset). The F1 Score increased up to 0.92 with 7500 labeled connections. (i.e. 75% of the CTU-13 dataset). Fig. 15 compares the results of SCS (blue lines) with results of the RF model (green lines). An additional random selection strategy

J.L.G. Torres, C.A. Catania and E. Veas / Journal of Information Security and Applications 49 (2019) 102388

11

Fig. 13. ILAB and Random Forest performance with noise tolerance using ELA measure.

Fig. 14. SCS performance with incremental training data.

Fig. 15. Random Selection, RF and SCS performances with incremental training data.

(red lines) is also plotted in the ﬁgure as reference. The random selection strategy consists of just randomly selecting a label for each unlabeled connection. Testing against a random strategy is a practice widely used in the evaluation of recommender systems. The learning rate curves for the three strategies are similar. The mean F1 score in the ﬁrst scenario for random selection, SCS, and RF model were 0.89, 0.85 and 0.92 respectively. Clearly, the imple-

mentation of an active learning strategy (using RF model in this case) improve both strategies: SCS and random selection. 6. Conclusions In this article, we propose an active learning strategy for helping the labeling process of network traﬃc datasets containing

12

J.L.G. Torres, C.A. Catania and E. Veas / Journal of Information Security and Applications 49 (2019) 102388

Normal and Botnet connections. In particular, a new Prediction Module was developed and inserted into the RiskID application workﬂow. Given a partially labeled dataset, the new Prediction Module constructs a random forest from previously labeled connections. The resulting model is capable of estimating the probability of the remaining unlabeled connections. Once the probability model was built, the application of the Uncertainty Sampling technique instructs the user to label those unlabeled connections with a probability very close to the decision boundary (most dubious connections) and thus help improving the performance of prediction model in a future learning cycle. The prediction model was tested on a total of 22 datasets divided in two groups of data called CTU-13 and CTU-19. The viability of applying Random Forest as a connection predictor was evaluated considering the standard machine learning 70/30 ratio. The resulting model showed an accuracy value of 0.93 providing a very accurate prediction of SMTP and HTTP traﬃc. However, based on the requirements elicited for the process of labeling a network traﬃc dataset, a more adequate evaluation process become necessary to verify the viability of including a model predictor inside RiskID. Therefore, we proposed an evaluation framework to validate the prediction module analyzing: learning rate, detection rate and robustness against noise, as well as the improvements on the labeling strategy. The prediction model showed a good learning rate improving the detection accuracy as we increased the number of instances in the training set (see Fig. 9). Likewise, the prediction model was able to predict correctly all the SMTP traﬃc with only a two percent of data in training and the probability estimation was progressively improved when more connection were labeled. On the other hand, the robustness study showed satisfactory results. The model proposed was capable to accept a 30 percent of noise in the training set and still throw correct labels (see Fig. 11). Finally, Different labeling strategies where used: Prediction Module based in RF using Uncertainty Sampling strategy, ILAB strategy, SCS and the random selection strategy. It is clearly observed that the prediction module based on RF exceeds the ILAB strategy and the labeling method normally used in simple RiskID. Since studies recreate the use of the application over time and the interaction between users and application, we consider this comprehensive study set represents a methodology for evaluating the performance of any active learning solution. We contend that this evaluation framework represents a contribution in itself and hope that researchers can consider the methodology when validating the suitability of other assistive algorithms for similar annotation tasks. The new Prediction Module does not pretend to be determinant when deciding if a connection is a “Botnet” or “Normal”. We simply intend to steer the process. Many factors play a role in the complex labeling process. To measure the real impact of the proposed prediction module, we need to consider not only a statistical evaluation but also the user interaction and conﬁdence with the proposed extension. Such evaluation is beyond the scope of this paper and subject of future work. Declaration of Competing Interest The authors declare that they have no known competing ﬁnancial interests or personal relationships that could have appeared to inﬂuence the work reported in this paper. Supplementary material Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.jisa.2019.102388.

References [1] Catania C, Garcia Garino C. Automatic network intrusion detection: current techniques and open issues. Comput Electr Eng 2012;7(11):1063–73. [2] Bhuyan MH, Bhattacharyya DK, Kalita JK. Towards generating real-life datasets for network intrusion detection. Int J Netw Secur 2015;17(6):683–701. [3] Sommer R, Paxson V. Outside the closed world: on using machine learning for network intrusion detection. In: Proceedings of the IEEE symposium on security and privacy; 2010. p. 305–16. doi:10.1109/SP.2010.25. http://ieeexplore. ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5504793. [4] Sebastian G. Stratosphere research laboratorys. 2015. https://stratosphereips. org/, [Online; accessed Jun-2018]. [5] University of California I. Knowledge discovery in databases DARPA archive. 1999. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html/ [Online; accessed September-2016]. [6] DEFCON Hacking Conference - capture the ﬂag archive. 2011. https://www. defcon.org/html/links/dc-ctf.html, [Online; accessed April-2018]. [7] Center for applied internet data analysis. 1997. University of California, San Diego, http://www.caida.org/ [Online; accessed April-2019]. [8] Mukkavilli S.K., Shetty S., Hong L. Generation of Labelled Datasets to Quantify the Impact of Security Threats to Cloud Data Centers 2016; (April): 172–184. http://www.scirp.org/journal/PaperInformation.aspx?paperID=65482. doi:10.4236/jis.2016.73013. [9] Görnitz N., Kloft M., Rieck K., Brefeld U.. Active learning for network intrusion detection 2009. doi:10.1145/1654988.1655002. [10] Aparicio-Navarro FJ, Kyriakopoulos KG, Parish DJ. Automatic dataset labelling and feature selection for intrusion detection systems. Proceedings the IEEE military communications conference MILCOM 2014:46–51. doi:10.1109/ MILCOM.2014.17. [11] Beaugnon A, Chiﬄier P, Bach F. ILAB: an interactive labelling strategy for intrusion detection. In: Dacier M, Bailey M, Polychronakis M, Antonakakis M, editors. Research in attacks, intrusions, and defenses. Cham: Springer International Publishing; 2017. p. 120–40. ISBN 978-3-319-66332-6. [12] Soule A, Rexford J. Webclass: adding rigor to manual labeling of traﬃc anomalies. Comput Commun Rev 2008;38(1):35–8. doi:10.1145/1341431.1341437. [13] Pius Owoh N, Mahinderjit Singh M, Zaaba ZF. Automatic annotation of unlabeled data from smartphone-based motion and location sensors. Sensors (Switzerland) 2018;18(7). doi:10.3390/s18072134. [14] Lemay A, Fernandez JM. Providing SCADA network data sets for intrusion detection research. In: Proceedings of the USENIX CSET; 2016. [15] Sperotto A, Sadre R, Van Vliet F, Pras A. A labeled data set for ﬂow-based intrusion detection. In: Lecture notes in computer science (including subseries lecture notes in artiﬁcial intelligence and lecture notes in bioinformatics), 5843 LNCS; 2009. p. 39–50. doi:10.1007/978- 3- 642- 04968-2_4. [16] Pelleg D, Moore A. Active learning for anomaly and rare-category detection. Adv Neural Inf Process Syst 2004;18(2):1073–80. [17] Guerra J, Catania CA, Veas E. Visual exploration of network hostile behavior. In: Proceedings of the ACM workshop on exploratory search and interactive data analytics - ESIDA ’17; 2017. p. 51–4. doi:10.1145/3038462.3038466. [18] Shneiderman B. The eyes have it: A Task by data type taxonomy for information visualizations. Craft Inf Vis 2003:364–71. doi:10.1016/B978-155860915-0/ 50046-9. [19] Kodinariya T, Makwana P. Review on determining number of cluster in KMeans clustering. Int J Adv Res Comput Sci Manag Stud 2013;1(6):90–5. www. ijarcsms.com. [20] Malware capture facility project. 2013. Czech Technical University, https:// mcfp.weebly.com/ [Online; accessed May-2019]. [21] Lewis DD, Gale WA. A sequential algorithm for training text classiﬁers. In: Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval. New York, NY, USA: Springer-Verlag New York, Inc.; 1994. p. 3–12. ISBN 0-387-19889-X. http://dl.acm.org/citation. cfm?id=188490.188495. [22] Staheli D, Yu T, Crouser RJ, Damodaran S, Nam K, O’Gwynn D, et al. Visualization evaluation for cyber security. In: Proceedings of the eleventh workshop on visualization for cyber security - VizSec ’14; 2014. p. 49–56. doi:10.1145/ 2671491.2671492. [23] Garcia S. Identifying, modeling and detecting botnet behaviors in the network. UNICEN University; 2014. Ph.D. thesis. doi:10.13140/2.1.3488.8006. [24] The CTU-13 dataset. 2011. Stratosphere Project, https://www.stratosphereips. org/datasets-ctu13/ [Online; accessed Jun-2018]. [25] The CTU-19 dataset, botnet kelihos tdptu02.exe. 2013a. https://mcfp.felk. cvut.cz/publicDatasets/CTU- Malware- Capture- Botnet- 3/ [Online; accessed Jun2018]. [26] The CTU-19 Dataset, Botnet 39UvZmv.exe. 2013b. Stratosphere Project, https:// mcfp.felk.cvut.cz/publicDatasets/CTU- Malware- Capture- Botnet- 1/ [Online; accessed Jun-2018]. [27] The CTU-19 Dataset, Normal Datasets. 2013c. Stratosphere Project, https:// www.stratosphereips.org/datasets-normal/ [Online; accessed Jun-2018]. [28] Sáez JA, Luengo J, Herrera F. Evaluating the classiﬁer behavior with noisy data considering performance and robustness: the equalized loss of accuracy measure. Neurocomputing 2016;176:26–35. doi:10.1016/j.neucom.2014.11.086. [29] Ruiz-Gazeb A, Villa N. Storms prediction: Logistic regression vs random forest for unbalanced data. Case Stud Bus Ind Gov Stat 2007;1(2):91–101. http://arxiv. org/ftp/arxiv/papers/0804/0804.0650.pdf. [30] Liu M, Wang M, Wang J, Li D. Comparison of random forest, support vector machine and back propagation neural network for electronic tongue data

J.L.G. Torres, C.A. Catania and E. Veas / Journal of Information Security and Applications 49 (2019) 102388 classiﬁcation: application to the recognition of orange beverage and chinese vinegar. Sens Actuators B Chem 2013;177:970–80. doi:10.1016/j.snb.2012.11. 071. http://www.sciencedirect.com/science/article/pii/S0925400512012671. [31] Breiman L. Random forests. Mach Learn 2001;45(1):5–32. doi:10.1023/A: 1010933404324. [32] Kuncheva LI. Combining pattern classiﬁers: methods and algorithms: second edition. Hoboken, New Jersey: John Wiley & Sons, Inc.; 2014. ISBN 9781118914564. doi:10.1002/9781118914564.

13

[33] Avazpour I., Pitakrat T., Grunske L., Grundy J. Recommendation systems in software engineering 2014. doi:10.1007/978- 3- 642- 45135- 5. [34] Collins M, Schapire RE, Singer Y. Logistic regression, AdaBoost and Bregman distances. Mach Learn 2002;48(1–3):253–85. doi:10.1023/A:1013912006537.

Active learning approach to label network traffic datasets

Active learning approach to label network traffic datasets

Recommend Documents