Journal Pre-proof
Deep Recurrent Neural Network For IoT Intrusion Detection System Muder Almiani , Alia AbuGhazleh , Amer Al-Rahayfeh , Saleh Atiewi , Abdul Razaque PII: DOI: Reference:
S1569-190X(19)30162-5 https://doi.org/10.1016/j.simpat.2019.102031 SIMPAT 102031
To appear in:
Simulation Modelling Practice and Theory
Please cite this article as: Muder Almiani , Alia AbuGhazleh , Amer Al-Rahayfeh , Saleh Atiewi , Abdul Razaque , Deep Recurrent Neural Network For IoT Intrusion Detection System, Simulation Modelling Practice and Theory (2019), doi: https://doi.org/10.1016/j.simpat.2019.102031
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier B.V.
1
Deep Recurrent Neural Network For IoT Intrusion Detection System Muder AlmianiA*, Alia AbuGhazlehA*, Amer Al-RahayfehB, Saleh AtiewiC, Abdul RazaqueD A
Al-Hussein Bin Talal University, Ma’an, Jordan. Jordan University of Science and Technology, Irbid, Jordan. B Al-Hussein Bin Talal University, Ma’an, Jordan. C Al-Hussein Bin Talal University, Ma’an, Jordan. D Department of Computer Engineering and Telecommunication, International IT University, Almaty Kazakhstan A* Al-Hussein Bin Talal University, Computer Information Systems Department, Ma’an, Jordan. Email:
[email protected].
[email protected] A* Jordan University, 11942, P.O. Box 13375, Amman, Jordan. E-mail address:
[email protected] -------------------------------------------------------------------------------------------------------------------------------------------A
ABSTRACT As a results of the large scale development of the Internet of Things (IoT), cloud computing capabilities including networking, data storage, management, and analytics are brought very close to the edge of networks forming Fog computing and enhancing transferring and processing of tremendous amount of data. As the Internet becomes more deeply integrated into our business operations through IoT platform, the desire for reliable and efficient connections increases as well. Fog and Cloud security is a topical issue associated with every data storage, managing or processing paradigm. Attacks once occurred, have ineradicable and disastrous effects on the development of IoT, Fog, Cloud computing. Therefore, many security systems/models have been proposed and/or implemented for the sake of Fog security. Intrusion detection systems are one of the premier choices especially ones that designed using artificial intelligence. In our paper, we presented an artificially full-automated intrusion detection system for Fog security against cyber-attacks. The proposed model uses multi-layered of recurrent neural networks designed to be implemented for Fog computing security that is very close to the end-users and IoT devices. We demonstrated our proposed model using a balanced version of the challenging dataset: NSL-KDD. The performance of our model was measured using a variety of typical metrics, and we add two additional metrics: Mathew correlation and Cohen’s Kappa coefficients for deeper insight. where the experimental results and simulations proved the stability and robustness of the proposed model in terms of a variety of performance metrics. Keywords: Internet of Things, intrusion detection, Kalman filter, IoT, Recursive Network
1INTRODUCTION Internet of Things (IoT) is considered as the next evolution of the internet, where the capability to connect to the internet is given to every entity [1,2] . Cisco-IBSG predicts that will be about more than 50 billion devices connected to the internet by 2020 [3]. This huge number of connected devices reveals a corresponding gigantic amount of traffic and digital data generated and transfer. From Megabyte (10 6) of data to Brontobyte (1027) and Geopbyte (1030), these measurements will be used to describe the tremendous amount of digital pool formed by the IoT platform. As a matter of fact, 40% of IoT-created data is stored, processed, analyzed and acted upon close to the edge of network edges where cloud shortcomings to meet the IoT requirements manifest. These shortcomings and
2
accelerated IOT development necessities oil the wheels for the development of Fog computing paradigm. On the other hand, as the depth of this digital pool magnifies, it is likely to become turbulent by various types of attacks and penetrations [4]. Accordingly, various approaches and techniques designed and implemented to protect the platform of IoT such as firewalls, data encryption, and user authentication through Fog computing paradigm. These vectors of attacks and threats keep evolving, leaving classical security techniques inefficient and ineffective to address the problem of IoT security opening the door for a new generation of intrusion detection systems built using machine learning and artificial neural networks. A huge body of works and researches have been conducted in the context of finding the best intelligent intrusion detection system in IOT-based environments for differtent types of applications [5,6]. As intrusion detection systems are one of the different major remedies applied for sake of IoT security, there is a tendency to use more than one technique concurrently as proposed by Alharbi et al. [7] where they demonstrated a proof-of-concept system for IoT security implemented in Fog computing layer. the proposed system composed of VPN server, a traffic analysis engine, challenge-response unit, and a firewall. Each unit thwarts specific types of attacks. VPN server secures the communication channels between IoT systems against sniff, spoof, and man-in-the-middle attacks. The intrusion detection systems of traffic analysis units were used to detect DoS and DDoS attacks where decision-tree machine learning technique was used as a classification engine. In order to authenticate the response of intrusion detection system, a challenge message is sent by the challenge-response unit in case of intrusion detection. As failing in responding to this message happened, the system blocks the connection by firewall unit. Pajouh et al. [8] proposed a novel layered intrusion detection system for IoT backbone networks using two-tier dimension reduction engine and two-tier classification engine. The engine of dimension reduction composed of component analysis and linear discriminate analysis units whereas the classification engine composed of Naïve Bayes and certainty factor version on K-nearest neighbor (CF-KNN) cascaded units. Naïve Bayes classifier was used to classify attack records which, in turn, refined by CF-KNN classifier as a second filtering layer. Using NSLKDD [9] dataset, the proposed model achieved competitive detection performance for hard-to-catch attacks, i.e., U2R and R2L classes. Using Wireshark software over IoT testbed network for four consecutive days and applying machine learning techniques on it, Anthi et al. [10] proposed predictive and adaptive intrusion detection system for IoT systems. The proposed system consists of two main phases. During the first phase, they built a real IoT smart-home testbed and the normal activities were monitored for each device connected on the IoT network. Then, in the second phase, malicious activities were applied to these devices leading to anomaly network traffic. These phases fed a supervised machine learning technique with proper training data which composed the core of the intrusion detection model. Dovom et al. [11] employed fuzzy and fast fuzzy pattern tree method for intrusion detection and malware categorization in IoT network. This type of fuzzy-based technique composed of a tree-like fuzzy top-down induction structure, where the inner nodes of the structure are fuzzy logic arithmetic operators whereas the leaves of these nodes are associated with fuzzy predicates applied on input features. Using Vx-Heaven, IoT, Kaggle and Ransomware datasets, their proposed model achieved high detection accuracy during reasonable run-times. For improved detection capability, Wang et al. [12] implemented a logarithmic marginal density ratios transformation to transmute NSL-KDD dataset features into new and better quality representative ones. Using the Support Vector Machine (SVM) as a classification engine, the empirical results showed robust performance in terms of detection rate and detection accuracy. Using a comprehensive representation of modern IoT attack scenarios, Zhang et al [13], used UNSW-NB [14] benchmark dataset to demonstrate the efficiency of machine learning-based intrusion detection. Although they used a simple multi-layer perceptron as a classifier, they used a novel feature selection engine applying Denoising Autoencoder (DAE) based on a weighted loss function. This novel technique of feature selection yielded an infused focus on attack-representative features. As another application of UNSW-NB dataset, an IoT network forensic architecture composed of decision tree C4.5, Naïve Bayes, Association Rule Mining (ARM) and Artificial Neural Network (ANN) machine learning techniques was proposed by Koroniotis et al. [15]to identify and track novel and complicated forms of current botnet attacks. As an example of the integration of SDN and IoT, Dawoud et al. [16] presented a deep learning-based intrusion
3
detection system for SDN-based IoT architecture, where SDN modeling was used for the IoT security, scalability and resilience enhancing purposes whereas Restricted Boltzman Machine (RBM) was used as the engine for intrusion detection. The proposed model evaluated and validated using KDD Cup’99 dataset where it achieved a competitive performance higher than 94% in terms of precision and accuracy. Hodo et al [17] proposed a simple multi-layered perceptron neural network trained with feedforward and backward learning algorithms for detecting DoS/DDoS attacks in IoT networks. IoT structure composed of five node sensors, one of them acted as a server relay node for data analysis while the other acted as client. The traffic of IoT network was captured using a network tap avoiding any modification may occur to the live traffic. DoS attacks were conducted by sending over 10 million UDP packets to single host whereas DDoS attacks were conducted by sending over 10 million of UDP packets to three hosts of wired speed overflowed the server node. The proposed was successfully able to catch DoS/DDoS attacks at highly competitive accuracy reached up to 99.4% as overall performance. To be used in computer networks, Mohammadi and Sabokrou [18] proposed a semi-supervised intrusion detection model built using deep structured neural networks trained by generative adversarial learning. The model comprises of two major phases: training and testing. Training phase which was performed only using the packet flow of normal connections of NSL-KDD dataset is composed mainly of two cascaded modules. The first module consists of encoder-decoder network whereas the second module consists of a fully-connected neural network followed by SoftMax classifier. The packet flow of anomaly network traffic was generated from the normal one using adversarial training by re-constructing the normal packets via an optimized encoder-decoder network. On the other hand, testing phase uses the trained neural network yielding from training phase where KDDTest+ was used completely. Detection accuracy of 91.39% was achieved by the proposed model. Another semi-supervised intrusion detection system was proposed by Kumari and Varma [19] where the classification engine composed of a hybrid combination of active learning Support Vector Machine (SVM) and Fuzzy c-means (FCM) clustering techniques. Besides the large amount of unlabeled data, active SVM technique uses a small subset of labeled dataset where it was approved that after N iterations, active learning SVM exhibits comparable detection performance as achieved by classical support vector machine. On the other hand, FCM classifier was applied on data items around support vectors for sake of multi-class categorization. In this model, intrusion detection was conducted using both classifiers engines: SVM and FCM. If both classifiers labeled an input as instance as normal, then it is considered as normal with confidence. However, if the input instance was labeled as anomaly by SVM engine as well as the sub-category of it was determined by FCM engine, then the instance is considered as abnormal and the nearest circle to support vectors with higher fuzzy membership was chosen as the sub-class. Other researchers used ensemble learning for robust IoT security, which is a technique of using multiple techniques/models or experts for solving a particular artificial intelligence-based problem. In the problem of intrusion detection, ensemble learning promotes better generalization and the voting between the different techniques of ensemble provide higher detection accuracy than the individual models as proposed by Illy et al. [20]. In this paper, we develop an intrusion detection system composed of cascaded filtering stages. where deep multi-layered recursive neural networks used for each filter and tuned to catch specific types of attacks that are wellknown for IoT environments such as DoS, Probe, R2L, and U2R. the remainder of this paper is organized as follows. Section 2 describes the details of our proposed intrusion detection model followed by thorough experimental validation of the proposed model presented in Section 3.
2 The Proposed Model
4
In this section, the architecture, concept and design principles of our proposed model are presented. Figure 1 shows the general architecture of our proposed model implemented in Fog computation layer.
Fig. 1 General framework of proposed intelligent IoT security model.
As shown in Figure 1, the proposed intelligent intrusion detection model composed of two major engines: traffic analysis engine and classification engine. Traffic connection records are pre-processed in traffic processing unit leading to traffic data in a format suitable to be processed by the deep neural network of classification engine whereas these connections are classified into normal and attack by intelligent intrusion detection engine. The proposed model can be implemented in Fog computing that is very close to end-users and IoT devices. The model adopts a recurrent neural network trained by an adaptive version of backpropagation algorithm for enhanced prediction capability of the normal/attack classification. A recursive structure from nonlinear parts’ outputs of neurons to the liner parts enables fast response and reliable real-time security protection for the IoT system. The recursive network represents the major engine of classification-based traffic analysis, namely, it analyses the network traffic that attempts to access the IoT system and give security alarm in case of detected intrusion. These two basic units are elaborated in the following subsections. 2.1 TRAFFIC PROCESSING ENGINE We used NSL-KDD dataset [9] for the sake of model training, testing, and validation. Data features that represent input traffic of networking system are naturally inconsistent. Thus, traffic data pre-treatment is a necessary gate for the classification engine [21]. Traffic Pre-processor engine applies four pre-processing steps on raw traffic data: (1) Symbolic-to-numeric transformation. (2) Features reduction. (3) Data min-max normalization. (4) Data oversampling.
2.1.1 Symbolic-to-Numeric Transformation and Labels Encapsulation As shown in Figure 2, a sample of off-line traffic data records show that the first numeric field is followed by three symbolic fields represent: protocol, service, and flag features respectively of connection records. In our work, we codify these fields as shown in Table 1. This step is equivalent to codify symbolic-value fields (attributes) of NSL-KDD dataset into numeric ones. Where protocol features are symbolized as . Service features as and Flag features as . Where corresponding numeric value (attribute weight) for each attribute value was selected based on the frequency of feature. As the frequency increases, the corresponding numeric value decreases. This way, the attributes of the least frequency will not be overwhelmed by the value of the highest frequency attributes.
5
Fig. 2 Snapshot of NSL-KDD dataset associated with zoomed box of symbolic attributes.
As a final step of dataset codification, the different labels of attacks sub-categories are capsulated and codified to their main categories as shown in Table 1. Table 1. NSL-KDD class categories: main and sub-class. † Sub-class
Category Assigned
Numeric Code
Back, Land, Neptune, Pod, Smurf, Teardrop, Mailbomb, Processtable, Udpstorm, Apache2, Worm
DOS
1
Satan,Ipsweep, Nmap, Portsweep, Mscan, Saint
Probe
2
Guess_password, Guess-passwd, ftp-write, Imap, Phf, Multihop, Warezmaster, Xlock, Xsnoop, Snmpguess, Snmpgetattack,Httptunnel,Sendmail,Named,Warezclient,Spy
R2L
3
Buffer-overflow, Loadmodule, Rootkit, Perl, Sqlattack, Xterm, Ps
U2R
4
† normal classes are given ‘0’ as a numeric code. Refereeing to Table1, we have two types of capsulations: for the binary engine (normal, attack classification), all records of training dataset are capsulated into normal and attack. In the second engine, the 40 attack labels (classes) are capsulated to their four major categories as shown in Table 1.
2.1.2 Features Reduction In this step, we get rid of all constant-valued attributes for all records in traffic data that have no effect on the analytical results of the neural network. In our work, features have been removed due to their zero value yields a reduced size of data volume from 41 to 26 features.
2.1.3 Data Min-Max Normalization For sake of proper range of data suit to be as neural network inputs, the attributes of traffic data are scaled so as to fall within a small specified range. In our work, we applied linear transformation in data represented by min-max normalization. Suppose that and represent the minimum and maximum values of feature respectively, then, minmax normalization maps the value of feature to the new value in the new range of : using (1):
6
(
)
(1)
2.1.4 Data Oversampling This step is an essential step in dataset pre-processing to address the issue of dataset imbalance. NSL-KDD dataset consists of about 125,000 records. It is easily be calculated that the percentage of normal, DoS, probe, R2L and U2R records are 67,343, 45,927, 11,656, 995 and 52 respectively. Graphically represented, Figure 1 shows that normal records represent about 50% of dataset followed by DoS and Probe records, and the rest of the classes composed less than 5% of the training dataset. As a result, to this oblivious imbalance, our neural network will show biased classification behaviour towards normal and DoS records and weak classification response against other least-frequent attack classes. Owing to the low frequency that shown by {R2L and U2R} attack classes, the neural network will deal with it as if they are noisy signals due to the negligible effect of these attacks on weight updating yielding an obtrusive weak detection to these particular types of attacks. As a remedy to this problem, both R2L and U2R attacks are oversampled through inserting repeated blocks of U2R and R2L records in different sites of data body. Oversampling resulted in new statistics and distribution of these rare types of attacks as can be noted in Figure 3.
Fig. 3 Graphical comparison of attacks distribution for raw NSLKDDTrain+ and our proposed balanced version.
7
2.2 INTRUSION DETECTION ENGINE Our proposed model consists of two cascaded detection tiers used two deep recursive neural networks with different internal structures and setup parameters and hyperparameters as shown in Figure 4. As shown in Figure 4, the first layer demonstrates DoS attack detection as it is considered one of the major
Fig. 4 Pipeline of proposed IoT intrusion detection model.
attack types that thwart IoT systems besides detecting other types of attacks. For a high level of security, the normal
8
response of the first filter is re-filtered using another network in the second layer with different internal structure, recursive gain and parameters setup where it tuned to catch the attacks that leaked out from the first layer, especially the hard-to-detect attacks, i.e., U2R and R2L attack; thus, for this purpose, the second layer of proposed model was trained by same training dataset of first filter network except that DoS attacks were excluded for more oriented training. We used a deep proportional recursive network structure and a modified version of back propagation algorithm as training algorithm to develop a stable intelligent multi-layered intrusion detection model. Originally, Scalero and Tepedelenlioglu [22] applied a modified version of the backpropagation algorithm on a simple feedforward neural network where the target mean-squared error was defined between the desired and actual inputs of the linear parts of neural network. However, in our work, rather than using a simple feedforward structure, we applied the modified backpropagation algorithm on a deep proportional recursive network structure as shown in Figure 5.
Fig. 5 Illustrative 3-layered neural network architecture of proposed intrusion detection model.
The deep recursive structure creates a non-linear proportional embedding of the previous state in the ( ) current state as: , where ξ represents the recursive gain. The traditional neural network structures that trained by traditional backpropagation algorithm suffers from the exploding gradient problem where the setup parameters or the hyperparameters in the hidden layers do not force the change as expected or it may force the neural network to instable state. Therefore, adding a proportional feeding back from previous state to the current state elevates this problem and enhances the stability of the neural network response. Referring to Figure 5, feedforward path consists primarily of two major parts: linear and non-linear parts. The weighted edges and the output of summation composed the linear part while the non-linear activation function composed the non-linear one. The proposed training algorithm can be decomposed into the following four steps: (1) Feedforward computation. (2) Backpropagation to the output layer. (3) Backpropagation to the hidden layer. (4) Weights update.
9
The algorithm is stopped when the value of the mean-squared value of error function has become sufficiently small or if it reaches the maximum number of iterations. These major steps are mathematically illustrated in the following subsections.
2.2.1 Feedforward Computation. Without loss of generality, in order to simplify the proposition of the training algorithm, we deal with a threelayered network ℒ =3 as: {Input, Hidden, Output}. The major goal of the training is to attain the optimal set of networks parameters/hyperparameters for hidden and output layers for the sake of optimal classification performance. Consider a network with an input vector for pattern =[ ], hidden nodes and output nodes. The weights between the input layer between hidden nodes
and output unit
will be
and hidden node
will be called
. The weighted response of the
whereas the weights pattern ξ
returned back to the input of summation unit and it is implemented as an additional weighted edge. Including bias into our account, the length of the input vector is extended by two: bias and weighted edge and this is applied to all layers. The excitation or the output of the linear part of neurons of the is given by (2): ∑
(2)
We choose symmetrical sigmoid as transfer function for all nodes of the network, the output of hidden layer
is
thus given by (3): ∑ Where
(3)
)
Represent the response of nonlinear parts of neurons and it chosen to be asymmetrical sigmoid as in (4): (4)
Where, : sigmoid slope. The outputs of all nodes of the hidden layer can be compared with the vector-matrix multiplication as in (5): (5) Same applied for output layer as (6): (6) These formulas can be generalized for any number of layers. Within the feedforward step, the vector =[ ] is presented to the network. Consequently, the vectors and are computed in Figure 5. The values of activation function , the derivative of the activation function and the inverse of the activation function are also computed at each unit.
2.2.2 Backpropagation to The Output Layer In this step, we are looking for the first set of partial derivatives of error signal
with respect to
as
.
However, our error signal is different than is used by all conventional versions of backpropagation neural networks. The error signal is the total mean-squared difference between actual summation outputs outputs
.For the output layer (7):
and desired summation
10
∑
(7)
Where: : error signal of the :
neuron of the output layer.
pattern. total number of patterns. : desired summation output of
neuron at
pattern.
: actual (estimated) summation output of the Error signal
neuron of the output layer at
can be minimized by taking the partial derivative of
pattern.
with respect to each weight
and
equating to zero as in (8):
Where,
∑(
)
∑
,
of hidden layer for
(8)
total number of neurons in the hidden layer.
: the output of the
neuron
pattern.
For each neuron of the output layer, making a substitution of derivation steps of [22] we end up with the following solution (9):
∑
in (6), and following (9)
Where, : cross-correlation vector between training instances and corresponding desired summation outputs. ∑ ∑ : autocorrelation matrix of training instances. . : weight matrix. According to (9), the solution to the partial differential equation of (8) is . where is the inverse of autocorrelation matrix . For the sake of implementation, Scalero and Tepedelenlioglu [22] adapted on-line training. This mode of training posed a problem where using an estimation of one-layer output depends on the data received from the previous one. Subsequently, at the beginning of training, the previous layer still untrained yields inaccurate correlation estimates in contrast to that attained at the end of training due to the accumulating nature of weights-correction approach, i.e., as the number of patterns involved in training increases, the estimation of network tends to be more accurate. In order to solve this dependency, Scalero and Tepedelenlioglu [22] added a forgetting factor to the recursive form (pattern-dependent) that forced the effect of previous training to be of negligible effect on the current estimates of the network as illustrated in (10) and (11): (10) (11) Where: is a forgetting factor. 0 ≤ ≤ 1. If the correlations of the network are specified in this way, the problem of passing large data into the network is solved. However, since attacks evolve and change in their attributes, frequency, and complexity, the approach of on-line training and adding forgetting factor to speed up the process is not sufficient for such type of pattern classification, therefore, in our work, we build our algorithm to work in a batch mode for accurate attack detection and we set the forgetting factor in equations (10) and (11) to 0.99 and we add additional internal recursive units that are independent of dataset correlations. These recursive units are of internal dependence, i.e., instead of
11
depending on the response of the whole network to the previous input pattern, our recursive units enable a dependency at neuron-level on the previous ξ-weighted response of each neuron to previous input pattern in a batch mode. To sum up, we have two forms of recursive in this network, one recursive form determines the dependence on the correlation of previous input patterns represented by and and the other recursive form determines the dependence on the weighted-output of each neuron to previous patterns and it represented by . In contrast to [22], we run both recursive forms in batch mode. and represent recursive forms that accelerate the rate of learning whereas the recursive form enables discriminative weight updating against attack instances that have overlapped features. Returning to Figure 5, we note that the recursive edge of ξ gain has no associated weight value. Thus, it will not be involved in weight updating steps. In other words, we can imagine ξ-weighted -edge as the basic catalyst for enhanced discriminative weight updating. Now the problem reduced to the issue of solving for . Since this type of problem belongs to recursive least square filtering [23], weight matrix can be found using Kalman filtering as: For pattern, is given as in (12): (12) Where, (13) : inverse of matrix of layer for input pattern. : Kalman gain of layer for input pattern. : Output of layer for input pattern. : forgetting factor. : transposed output of layer for input pattern. Based on equations (12) and (13), backpropagated signal of the node of output layer ℒ is: ℒ
(14)
And the partial derivative we are looking for: ℒ
(15)
where, : first derivative of neuron transfer function
with respect to
.
2.2.3 Backpropagation to the hidden layer Now, our task is to compute the partial derivative each neuron
for hidden layers, where each
in the output layer with a weighted-edge
backpropagated error up to unit in the
, for all
as shown in Figure 5. Thus, the
the hidden layer is computed as in (16):
∑ And the partial derivative of error function
neuron is connected to
(16) with respect to
is given by (17):
12
(17)
2.2.4 Weights update Since all partial derivatives have been computed, the network weights are updated in the negative gradient direction using Kalman gain as in (18): For output layer ℒ : ℒ
(18)
Where, : the inverse function value of network target output
as given mathematically by (19): (19)
Where, : desired network output (targets) of
node of output layer ℒ .
: the desired summation of node of output layer ℒ . On the other hand, for the hidden layer, weight updating proceeds as in (20): (20) Where, : backpropagation step size of hidden layer. This proposed algorithm can be extended to any number of hidden layers. In our proposed model, we adapted MeanSquared Error (MSE) and the number of iterations as measures for network output convergence.
3SIMULATION RESULTS This section presents the experimentations carried out and the simulating results obtained from running our intrusion detection model applying different types of running parameters. It also discusses these results and provides a comparison with previous works. To build our model, we used Intel® Core TM i7 4Due 2.4,1.8GHZ CPU and 8.0 GB RAM configured with Windows 10. The model was developed in MATLAB® 2018b [24] environment. Regarding our dataset, as we stated in our model exposition, NSL-KDD dataset was chosen to be our training and testing benchmark dataset even though we could use KDD Cup’99 dataset for this purpose but the immensity of KDD Cup’99 dataset imped a fatal issue. Due to the high redundancy of records, many of machine-learning and artificial intelligence-based intrusion models that run using KDD Cup’99 dataset showed high performance reached up to 99% in all aspects of performance measuring without notable trade-off or consolidated tuning operations. Therefore, it was inequitable to use KDD Cup’99 dataset as a basis for comparing different machine learning models in terms of detection performance. According to [9], the redundancy of KDD Cup’99 dataset is 78% and 75% in the train and test datasets. Thus, intelligent systems/models are learned using duplicated records. To make it worse these systems/models are validated and tested using duplicated records as well. Thus, the problem of inaccurate and unreliable detection performance is amplified. On the other hand, although of merits of NSL-KDD dataset, removing the redundancy of KDD Cup’99 dataset enhance the issue of imbalance between high frequent {DOS, Probe} and low frequent attacks (U2R, R2L) which was solved in our work using data oversampling tactic as stated earlier in our model exposition. In this layered model, the confusion matrix is the building block of all performance metrics where it was generated for each layer. It includes significant information about actual and predicted output classes. Based on the confusion matrix, the following performance metrics are computed as follows: • True Positive (TP): this value represents the correct classified attack records as attacks.
13
• True Negative (TN): this value represents the correct classified normal records as normal. • False Positive (FP) and False Negative (FN): these values illustrate that an incorrect classification takes place. If the attack record is classified as a normal one, a value of FP is recorded and presents a critical problem for confidentiality and availability of network resources since attackers succeed to pass through intrusion detection system. On the other hand, FN is recorded when normal records classified as attack ones. A false positive is basically an alarm on acceptable behaviour or as it called a false alarm rate. Table 2 elaborates these concepts in the framework of the confusion matrix. Table 2: Typical confusion matrix for binary classification. Predicted Class Normal
Attack
Actual
Normal
TN
FP
Class
Attack
FN
TP
Based on the confusion matrix defined in Table 2, we define the following performance metrics: These metrics are specified mathematically as in (21-26) and fully described in [25]: (21) (22) (23) (24) (25)
(26) While most of the performance measures focus on detection rate and detection accuracy, in our work, we adopt two additional performance metrics: Kappa and Mathew Correlation Coefficient. The major reason behind this further adaptation is measuring the stability of recursive network performance. Mathew Correlation Coefficient (MCC) ranges between -1 and 1, where -1 refers to complete wrong binary classification whereas 1 refers to completely correct binary classification. This metric allows us to gauge how robust our classification engine is performing and it is given as in (27): √
(27)
In prediction modelling, performance metrics as in (21-26) do not provide the complete picture of our classification, especially in the highly-balanced dataset as we work with. Cohen’s Kappa К coefficient is a very powerful one, where it can handle imbalanced classes effectively and it is mathematically given as in (28): (28)
14
Where, (29) (30) Where, (31) (32) Basically, Kappa coefficient indicates how much better our classification engine is performing over the performance that would obtain if the classifier depends on the random frequency of classes which, in turns, reflects the robustness and high stability of classification engine against least-frequent difficult-to-catch attacks depending on the numeric value of К coefficient. Landis and Koch (1977) [26] considered К values ≤ 0 as indicator of useless classifier, (0-0.2) indicates a slight agreement, (0.21-0.4) as fair,(0.41-0.60) as moderate, (0.61-0.80) as substantial and (0.81-1) as almost perfect agreement. Figure 6 shows the 1st detection layer of our proposed layered intrusion detection model. This layer is considered as a first defence layer; therefore, the issues as binary detection rate, binary detection accuracy and response time are of high priority.
Fig. 6. Block diagram of first detection layer of proposed model associated with network free parameters.
For first layer simulations, we used 68,000 training records, while 40,000 records were used for testing purposes. Training dataset composed of {normal = 33,901 |Dos = 23,390 |Probe = 5356 |R2L = 4640 |U2R = 713} whereas testing dataset composed of {normal = 19,657 |DoS= 13,855 |Probe = 3446 |R2L = 2079 |U2R = 277}. As can be noted from Figure 6, the classification performance of recursive neural network is affected by various network free parameters. In order to obtain the as-optimal-as-possible performance, the initial weights and the initial value of the autocorrelation matrix represents first parameters were required to be set. After applying 500 training iterations, and values were set and the values of other parameters were changed starting with recursive gain ξ until we got the most relatively satisfying performance as illustrated in Table 3 and Table 4. Table 3: Confusion matrix for binary classification of 1st detection layer. Predicted Class
15
Normal
Attack
Actual
Normal
19369
948
Class
Attack
2084
17573
Table 4. Performance measurements of the 1st detection layer. Performance Metric
Value
Detection Rate
95.34%
Accuracy
92.42%
Precision
90.30 %
False-positive Rate
10.06%
Mathew correlation coefficient
0.8496
Cohen’s Kappa К
0.8482
Further analysis of normal and anomaly responses of 1 st detection layer reveals an important result: deep recursive neural network was able to detect DoS attacks with 0% FPR. Equally important, the DoS detection rate reached up to 97.83% for 1st detection layer and 98.27% after the records re-filtered through second detection layer as depicted in Figure 7.
Fig. 7. Graphical pipeline represents detection performance of DoS attack.
Consequently, any anomaly traffic is caught by the first detection layer and identified as DoS, then it is 97% correct detection, which represents very important detection capability to IoT security system. DoS attacks are considered prominent attacks since these types of attacks affect bandwidth, IoT network resources (devices), CPU, etc. where IoT devices are no longer accessible for legitimate users. besides DoS detection, the first detection layer can detect Probe attacks category as shown in Figure 8. Although probe attack is not as well-known as DoS attack in IoT networks, first layer of our proposed system can
Fig. 8. Graphical pipeline represents detection performance of Probe attack.
16
detect this type of attacks with 84.38% detection rate where probe attacks that leaked from first layer are detected by the second layer enhancing the overall detection rate to 97.36% as shown in Figure 8. Although first detection layer shows high detection performance against DoS and Probe attacks, it shows a deliberate performance degradation against R2L and U2R attacks as can be shown in Figure 9. A closer examination in the normal response of the first detection layer reveals the existence of {R2L, U2R} attacks in the normal response that were weakly detected by the first recursive network which elaborate the major role of the second filtering layer.
Fig. 9. Graphical pipeline represents detection performance of R2L and U2R attacks. Fig. 10. MSE profiles versus different number of iterations for (a), (b), (c) bad weights initialization. (d) proper weights initialization revealed in smooth monotonic decreasing MSE profile.
| The performance experimental results presented in Table 3 and Table 4 are obtained for | and ] where the dataset was normalized to |≤ applied on deep layered structure [ range. ξ = 2.25, b= 0.99 and α= 0.2. Backpropagation step size µ was set to 1.5 for the first hidden layer and 0.2 for successive ones. As all intelligent learning models, initial weights are of high impact on network convergence as can be demonstrated in Figure 10. (a), (b), and (c) for different improper weighs initialization. |
17
On the other hand, adding a recursive gain ξ affects the stability and detection performance of 1st detection layer as elaborated in Figure 11, Figure 12 and Figure 13.
Fig. 11. Effect of recursive gain ξ on detection performance in terms of false positive rate (alarm rate).
18
Fig. 12. Effect of recursive gain ξ on detection performance in terms of detection rate, detection accuracy and the precision of detection.
Fig. 13. Effect of recursive gain ξ on detection performance in terms of detection rate, detection accuracy and the precision of detection.
19
The major duty of the second detection layer is to detect the hidden attacks leaked in the normal response of the first detection layer where the major focus is the capability of this layer in filtering difficult-to-catch attacks {R2L, U2R} in a robust way without depending on the statistical distribution of these attacks besides detected the other types that leaked from first filter. A detailed block diagram of the second filtering layer along with the parameters/hyperparameters of the recursive network model are depicted in Figure 14.
Fig. 14. Pictorial Block diagram of second detection layer associated with network parameters/hyperparameters.
Since the first layer was able to detect DoS attacks with 0% FPR, then, as shown in the general block diagram of proposed system in Figure 4, we excluded 23,390 DoS records out of training dataset and used the rest for the | | recursive neural network of the second layer. Applying | for hidden layers and | [ ] | and | applying for the deep structure where dataset was normalized in the range of . Recursive gain ξ was set to 0.15, b= 0.99 and α= 0.2. backpropagation step size was set to 1.5 for first hidden layer and 0.2 for successive ones. Table 5 lists the binary performance of second filtering layer. Table 5: Confusion matrix for binary classification of second detection layer. Predicted Class Normal
Attack
Actual
Normal
18341
1055
Class
Attack
1123
961
Complementary to performance results reported in Table 5, zeta recursive gain ξ, as well as the nodes (neurons) of the deep-layered structure of second recursive network exerts a major effect on the various aspects of detection performance. Not only the sensitivity, false positive rate and the precision of second layer performance but also Fmeasure, Mathew and Kappa coefficients are dramatically altered by the value of ξ recursive gain as demonstrated in Figure 15, Figure 16, and Figure 17.
20
Fig. 15. Effect of recursive gain ξ on the predictive performance of second detection layer in terms false positive rate. Fig. 16. Effect of recursive gain ξ on the predictive performance of second detection layer in terms F1-score, MCC and Kappa coefficients
21
Fig. 17. Effect of recursive gain ξ on the predictive performance of second detection layer in terms Detection Rate, Accuracy,Precision.
As the previous state of the recursive network partially controls the current state, then, a subsequent question arises whether the detection of rare and hard-to-detect attacks of the second detection layer is affected by the number of iterations that applied to the training phase. Figure 18 and Figure 19 demonstrate the effect of the number of iterations on the model detection performance in terms of detection rate, detection accuracy, precision as the first group and in terms of F1-score, MCC and К coefficients as the other group.
Fig. 18. Effect of number of training iterations on system performance in terms of detection rate, accuracy and precision metrics of 2nd detection layer.
22
Fig. 19. Effect of number of training iterations on system performance in terms of F1-score, Kappa and MCC metrics of 2 detection layer.
nd
As can be shown from Figure 18 and Figure 19 after 200 training iterations, the model shows relatively higher false positive rates at ξ=0.15 and ξ = 0.20. Nevertheless, the model shows the best performance in terms of detection rate, accuracy, precision, F1-score, and MCC and К coefficients. As the goal is to optimize the predictive accuracy of the proposed model, we adapted ξ =0.15 as optimal solution. In other words, returning more than 35% of previous state of network back to the inputs results in noticeable degradation in the predictive performance, whereas removing the recursive paths, i.e. set ξ = 0, results in worse performance. Therefore, returning 15% of previous state was adapted as optimal one. Moreover, the network fails to converge at other ξ values such as ξ= {0.5,0.9,1,1.5,2, ξ ≥ 2.5}. Table 6 presents a comparison analysis of proposed detection model with other intrusion detection models/systems in terms of computational overheads. Table 6. Performance comparison with other IoT security models in terms of execution time. Author, Year
Time
Method
Illy et al. (2019) [20]
Training time: 7seconds for all records of KDDetrain+ dataset
Voting Ensemble using KNN, Random forest and boosting of decision trees.
Prediction time: less than 3 seconds for all records of KDDTest+ dataset Mohammadi and Sabokrou (2019) [18]
Each input record needs 45µsec on average to be processed.
Semi-supervised learning composed of: encoder, decoder and neural networks both trained using adversarial learning algorithm.
Kumari and Verma (2017) [19]
Training time 3.42 seconds for 18,939 records
Hybrid combination of active SVM and FCM
Proposed Model
Each input record requires 66 µsec on average to be processed.
Multi-layered deep recursive neural networks.
The overall model performance is pictorially elaborated in Figure 20 whereas Table 7 and Table 8 present a
23
comparison analysis of proposed detection model with other intrusion detection models/systems.
Fig. 20. Overall performance of predicted model. Table 7. Multi-class overall model Performance. Attack Category
Detection Rate
DoS
98.27%
Probe
97.35%
R2L
64.93 %
U2R
77.25%
In our comparative analysis, we concentrate our attentions on those works specifically targeting the security of IoT systems and networks as well as works that used artificial intelligence and machine learning techniques that developed via NSL-KDD dataset. Table 8. Performance comparison of different classifiers with binary class NSL-KDD dataset. Author, Year
DR
ACC
PRE
FPR
F1-Score
Mathew correlation coefficient MCC
Cohens’ Kappa coefficient К
FNR
Illy et al. (2019) [20]
-
85.81%
-
-
-
-
-
-
Pajouh et al. (2019) [8]
84.86 %
-
-
4.86%
-
-
-
-
Mohammadi and Sabokrou (2019) [18]
-
91.39%
-
-
-
-
-
-
Kumari and Verma (2017) [19]
99.6%
-
-
5.86%
-
-
-
0.5%
Proposed Model
94.27%
92.18%
90.23%
9.8%
92.29%
84.44%
84.36%
5.7%
† DR: Detection Rate. ACC: Accuracy. PRE: Precision. FPR: False Positive Rate. FNR: False Negative Rate.
As shown in Table 8, the results demonstrate that our cascaded multilayered filtering using recursive neural networks produced better results in terms of detection accuracy compared to other methods and a comparable performance in terms of detection rate and false positive rate. Kumari and Verma (2017) [19] performs better in terms of FNR but underperforms for other metrics. On the other hand, our proposed model achieved almost perfect
24
agreement embodies in the values of К and MCC coefficients reached to 84.44% and 84.36% respectively, which, in turn, ensures the robustness of the model against low-frequent attack. Moreover, due to the addition of recurrence elements, the model detection performance shows almost perfect independence from the random frequency of the attacks providing the suitability to be implemented in real-time IoT security applications.
4 C O N C LU S I O N In this paper, we have proposed a Fog computing-based intrusion detection model for IoT network security. The proposed model adapts a recurrent neural network trained by an enhanced version of backpropagation algorithm. The results of performance evaluation reveal the effectiveness of the adaptive cascaded filtering using the recursive structure of nueural networks where each network adaptively tuned to different parameters/hyperparameters for enhancing the detection of specific intrusion types. As a result, the model shows high sensitivity to DoS attacks that represent one of the prominent attacks thwart the development of IoT network besides detecting other types of attacks’ categories such as Probe, R2L ,and U2R in a competitive computational overhead as each record requires 66 µsec on average to be processed. Thus, the proposed model is capable of properly and efficiently working in real time environments.
5ACKNOWLEDGEMENT This work is fully supported by the Deanship of Scientific Research and Graduate Studies at Al-Hussein Bin Talal University, Jordan.
REFERENCES [1]
[2]
[3] [4]
[5]
[6] [7]
[8]
[9]
[10] [11]
V. Balasubramanian, S. Otoum, M. Aloqaily, I. Al Ridhawi, Y. Jararweh, Low-latency vehicular edge: A vehicular infrastructure model for 5G, Simul. Model. Pract. Theory. 98 (2020) 101968. doi:10.1016/j.simpat.2019.101968. I. Al Ridhawi, M. Aloqaily, Y. Kotb, Y. Jararweh, T. Baker, A Profitable and Energy-Efficient Cooperative Fog Solution for IoT Services, IEEE Trans. Ind. Informatics. 3203 (2019) 1–1. doi:10.1109/tii.2019.2922699. D. Evans, The Internet of Things How the Next Evolution of the Internet The Internet of Things How the Next Evolution of the Internet Is Changing Everything, (2011). Al Ridhawi, Ismaeel, Moayad Aloqaily, Burak Kantarci, Yaser Jararweh, and Hussein T. Mouftah. "A continuous diversified vehicular cloud service availability framework for smart cities." Computer Networks 145 (2018): 207-218. K.A.P. da Costa, J.P. Papa, C.O. Lisboa, R. Munoz, V.H.C. de Albuquerque, Internet of Things: A survey on machine learning-based intrusion detection approaches, Comput. Networks. 151 (2019) 147–157. doi:https://doi.org/10.1016/j.comnet.2019.01.023. Quwaider M, Jararweh Y. A cloud supported model for efficient community health awareness. Pervasive and Mobile Computing. 2016 Jun 1;28:35-50. S. Alharbi, P. Rodriguez, R. Maharaja, P. Iyer, N. Bose, Z. Ye, FOCUS: A Fog computing-based security system for the Internet of Things, CCNC 2018 - 2018 15th IEEE Annu. Consum. Commun. Netw. Conf. 2018-Janua (2018) 1–5. doi:10.1109/CCNC.2018.8319238. H.H. Pajouh, R. Javidan, R. Khayami, A. Dehghantanha, K.R. Choo, A Two-Layer Dimension Reduction and Two-Tier Classification Model for Anomaly-Based Intrusion Detection in IoT Backbone Networks, IEEE Trans. Emerg. Top. Comput. 7 (2019) 314–323. doi:10.1109/TETC.2016.2633228. M. Tavallaee, E. Bagheri, W. Lu, A.A. Ghorbani, NRC Publications Archive ( NPArC ) Archives des publications du CNRC ( NPArC ) A Detailed Analysis of the KDD CUP 99 Data Set A Detailed Analysis of the KDD CUP 99 Data Set, (2009). E. Anthi, L. Williams, P. Burnap, Pulse: an adaptive intrusion detection for the internet of things, (2018) 35 (4 pp.)-35 (4 pp.). doi:10.1049/cp.2018.0035. E.M. Dovom, A. Azmoodeh, A. Dehghantanha, D.E. Newton, R.M. Parizi, H. Karimipour, Fuzzy pattern
25
[12] [13]
[14] [15]
[16] [17]
[18] [19]
[20] [21]
[22] [23] [24] [25]
[26]
tree for edge malware detection and categorization in IoT, J. Syst. Archit. 97 (2019) 1–7. doi:10.1016/j.sysarc.2019.01.017. H. Wang, J. Gu, S. Wang, An effective intrusion detection framework based on SVM with feature augmentation, Knowledge-Based Syst. 136 (2017) 130–139. doi:10.1016/j.knosys.2017.09.014. H. Zhang, C.Q. Wu, S. Gao, Z. Wang, Y. Xu, Y. Liu, An Effective Deep Learning Based Scheme for Network Intrusion Detection, Proc. - Int. Conf. Pattern Recognit. 2018-Augus (2018) 682–687. doi:10.1109/ICPR.2018.8546162. N. Moustafa, J. Slay, UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set), in: 2015 Mil. Commun. Inf. Syst. Conf., 2015: pp. 1–6. N. Koroniotis, N. Moustafa, E. Sitnikova, J. Slay, towards developing network forensic mechanism for botnet activities in the IoT based on machine learning techniques, Lect. Notes Inst. Comput. Sci. Soc. Telecommun. Eng. LNICST. 235 (2018) 30–44. doi:10.1007/978-3-319-90775-8_3. A. Dawoud, S. Shahristani, C. Raun, Deep learning and software-defined networks: Towards secure IoT architecture, Internet of Things. 3–4 (2018) 82–89. doi:10.1016/j.iot.2018.09.003. E. Hodo, X. Bellekens, A. Hamilton, P.-L. Dubouilh, E. Iorkyase, C. Tachtatzis, R. Atkinson, Threat analysis of IoT networks Using Artificial Neural Network Intrusion Detection System Keywords—Internet of things,Artificial Neural Network,Denial of Service,Intrusion detection System and Multi-Level Perceptron, (2016) 4–9. B. Mohammadi, M. Sabokrou, End-to-End Adversarial Learning for Intrusion Detection in Computer Networks, (2019) 1–4. http://arxiv.org/abs/1904.11577. V.V. Kumari, P.R.K. Varma, A semi-supervised intrusion detection system using active learning SVM and fuzzy c-means clustering, in: 2017 Int. Conf. I-SMAC (IoT Soc. Mobile, Anal. Cloud)(I-SMAC), 2017: pp. 481–485. P. Illy, G. Kaddoum, C.M. Moreira, K. Kaur, S. Garg, Securing Fog-to-Things Environment Using Intrusion Detection System Based On Ensemble Learning, (2019) 15–18. http://arxiv.org/abs/1901.10933. Aloqaily, Moayad, Ismaeel Al Ridhawi, Haythem Bany Salameh, and Yaser Jararweh. "Data and service management in densely crowded environments: Challenges, opportunities, and recent developments." IEEE Communications Magazine 57, no. 4 (2019): 81-87. N. Tepedelenlioglu, A Fast New Algorithm for Training Feedforward Neural Networks, IEEE Trans. Signal Process. 40 (1992) 202–210. doi:10.1109/78.157194. S.S. Haykin, S.S. Haykin, Kalman filtering and neural networks, Wiley Online Library, 2001. MATLAB, (2018). https://www.mathworks.com. M. Almiani, A. AbuGhazleh, A. Al-Rahayfeh, A. Razaque, Cascaded hybrid intrusion detection model based on SOM and RBF neural networks, Concurr. Comput. Pract. Exp. (2019) e5233. https://doi.org/10.1002/cpe.5233. J.R. Landis, G.G. Koch, The Measurement of Observer Agreement for Categorical Data, Biometrics. 33 (1977) 159. doi:10.2307/2529310.