A novel intrusion detection system based on an optimal hybrid kernel extreme learning machine

A novel intrusion detection system based on an optimal hybrid kernel extreme learning machine

Knowledge-Based Systems xxx (xxxx) xxx Contents lists available at ScienceDirect Knowledge-Based Systems journal homepage: www.elsevier.com/locate/k...

2MB Sizes 0 Downloads 109 Views

Knowledge-Based Systems xxx (xxxx) xxx

Contents lists available at ScienceDirect

Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys

A novel intrusion detection system based on an optimal hybrid kernel extreme learning machine✩ Lu Lv, Wenhai Wang, Zeyin Zhang, Xinggao Liu



State Key Laboratory of Industrial Control Technology, College of Control Science & Engineering, Zhejiang University, 310027 Hangzhou, PR China

article

info

Article history: Received 23 August 2019 Received in revised form 7 February 2020 Accepted 10 February 2020 Available online xxxx Keywords: Intrusion detection system Extreme learning machine Gravitational search algorithm Differential evolution Kernel principal component analysis

a b s t r a c t Intrusion detection is a challenging technology in the area of cyberspace security for protecting a system from malicious attacks. A novel accurate and effective misuse intrusion detection system that relies on specific attack signatures to distinguish between normal and malicious activities is therefore presented to detect various attacks based on an extreme learning machine with a hybrid kernel function (HKELM). First, the derivation and proof of the proposed hybrid kernel are given. A combination of the gravitational search algorithm (GSA) and differential evolution (DE) algorithm is employed to optimize the parameters of HKELM, which improves its global and local optimization abilities during prediction attacks. In addition, the kernel principal component analysis (KPCA) algorithm is introduced for dimensionality reduction and feature extraction of the intrusion detection data. Then, a novel intrusion detection approach, KPCA-DEGSA-HKELM, is obtained. The proposed approach is eventually applied to the classic benchmark KDD99 dataset, the real modern UNSW-NB15 dataset and the industrial intrusion detection dataset from the Tennessee Eastman process. The numerical results validate both the high accuracy and the time-saving benefit of the proposed approach. © 2020 Published by Elsevier B.V.

1. Introduction With the increasing development of network technology, particularly, with the popularity of the Internet, the problem of cyber security has been the focus of a growing number of people [1,2]. As a new security defense technology, the intrusion detection system (IDS) can actively protect a network system from illegal external attacks. IDS is designed to ensure the security of systems and can promptly detect abnormal phenomena [3]. Furthermore, IDS can improve the reliability and security of systems by detecting and responding to various malicious behaviors. Generally, IDSs can be classified into two categories: anomaly detection systems (profile-based detection systems) and misuse detection systems (signature-based detection systems). Anomaly detection systems aim at behavior that deviates from a normal profile of the system, while misuse detection systems aim at behavior that matches a known attack scenario [4,5]. Even if anomaly detection systems perform better in detecting unknown attacks, they usually yield a high false alarm rate. This limitation is addressed by misuse detection systems, which rely on specific ✩ No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys. 2020.105648. ∗ Corresponding author. E-mail address: [email protected] (X. Liu).

attack signatures to distinguish between normal and malicious activities. However, these systems are directly influenced by the freshness of the detection rules, thus, improvements in the detection accuracy and learning speed of intrusion detection systems remain a challenging task [6]. Recently, considerable work has been done in the area of intrusion detection (ID), which attempts to design anomaly and/or misuse detection systems to detect malicious attacks with a high detection rate and low false alarm rate. Tsang et al. [7] proposed an effective anomaly detection approach to extract both accurate and interpretable fuzzy rules from network traffic data for classification. The fuzzy rule-based IDS was based on an agentbased evolutionary framework and carried out a genetic feature selection for dimensionality reduction. Principal component analysis (PCA) was utilized successfully in ID by Salo et al. [8]. PCA can extract the most significant features by mapping the input dataset into an uncorrelated subspace. The k-nearest neighbor (KNN) method was used as a basic classifier to detect malicious attacks, which provided effective misuse detection systems with a high accuracy and detection rate [9,10]. Xiang et al. [11] introduced a novel multilevel hybrid classifier based on Bayesian clustering and decision trees, and they adopted this method for the IDS. An intelligent signature-based detection system called Dendron was presented by Dimitrios et al. [6] and classified various types of attacks; this methodology combined the advantages of both decision trees and genetic algorithms to obtain accurate detection rules. Chan et al. [12] proposed a policy-enhanced

https://doi.org/10.1016/j.knosys.2020.105648 0950-7051/© 2020 Published by Elsevier B.V.

Please cite this article as: L. Lv, W. Wang, Z. Zhang et al., A novel intrusion detection system based on an optimal hybrid kernel extreme learning machine, Knowledge-Based Systems (2020) 105648, https://doi.org/10.1016/j.knosys.2020.105648.

2

L. Lv, W. Wang, Z. Zhang et al. / Knowledge-Based Systems xxx (xxxx) xxx

fuzzy model with adaptive neuro-fuzzy inference system features, which can countermeasure SOAP-related attacks with a high detection accuracy and low false positive rate. Furthermore, they also developed fuzzy associative rules to effectively countermeasure SOAP-related and XML-related attacks in Web services and e-commerce applications [13,14]. Wathiq et al. [15] proposed a method called real-time multi-agent system for an adaptive intrusion detection system (RTMAS-AIDS) to allow the IDS to adapt to unknown attacks in real-time, and this method applied a hybrid support vector machine (SVM) and extreme learning machine (ELM) to classify normal behavior and known attacks. An effective SVM-based ID algorithm was presented by Tao et al. [16] to identify intrusions, which obtained great results. In addition, the improved algorithms of ELM and SVM were widely used for ID applications [17–20]. These advances have achieved a great performance for detecting and reporting malicious attacks. Nevertheless, the better accuracy and efficiency of the prediction model is still the first purpose of an IDS. The objective of this paper is to provide an accurate and effective misuse intrusion detection system with machine learning techniques that rely on specific attack signatures to distinguish between normal and malicious activities with a high accuracy and fast learning speed. It is well known that the determination of a suitable configuration for a particular dataset is a demanding problem in machine learning. Therefore, numerous researchers have attempted to find the optimal parameters of machine learning models. Aburomman and Reaz [21] applied the particle swarm optimization (PSO) algorithm to the SVM-KNN ensemble method to create a classifier with a better accuracy for ID. The authors also proposed a novel weighted SVM multiclass classifier based on differential evolution (DE) for the IDS [22]. ELMs are a popular area of research for detecting possible intrusions and attacks [23]. Ku and Zheng [24] proposed an improved learning algorithm named self-adaptive differential evolution extreme learning machine with Gaussian kernel for classifying and detecting the intrusions. Bostani and Sheikhan [25] introduced a hybrid binary gravitational search algorithm (GSA) for feature selection in IDSs, and this method can find a great subset of features and achieve a high accuracy and detection rate. GSA is a heuristic optimization method with fewer parameters to be determined, which has the benefits of a high convergence rate and strong local optimization ability. Meanwhile a DE algorithm is a heuristic optimization method with a strong global optimization ability, which has the benefit of great adaption [26,27]. Unfortunately, both the GSA and DE algorithms have their own disadvantages, whereby the former is easily trapped into a local optimum and the latter’s local optimization ability is relatively weak. An effective approach to deal with ID problems is therefore presented. First, considering the timeliness requirement of ID problems, an ELM method is selected as the basic model in the current work. Furthermore, to improve the accuracy of the ELM method, a hybrid kernel function combining the radial basis function (RBF) kernel with the polynomial kernel is derived and introduced to the ELM model. The proof that the proposed hybrid kernel function satisfies Mercer’s theorem is also given. Second, taking advantage of both the GSA and DE algorithms, a hybrid differential evolution combined with gravitational search algorithm (DESGA) is proposed to optimize the parameters of the proposed model, which improves both the local and global optimization abilities over those of the individual algorithms. Third, the kernel principal component analysis (KPCA) is introduced for the dimensionality reduction and feature extraction of the nonlinear ID data. The significance of this paper is summarized as follows.

• A new HKELM method with a hybrid kernel function is proposed that improves both the generalization and learning









ability of the KELM method aiming at providing accurate and efficient misuse intrusion detection methods. Furthermore, the proof of the proposed hybrid kernel function is provided in detail. In the context of DE and GSA, a hybrid algorithm, DEGSA, is proposed and combines the benefits of DE and GSA with the aim of improving both the local and global optimization abilities for detecting attacks. The KPCA algorithm is introduced for the dimensionality reduction and feature extraction of the intrusion detection data. Then, an effective intrusion detection approach, KPCADEGSA-HKELM, is obtained. The proposed approach is compared with other literature methods in an extensive testbed comprising of three intrusion detection datasets, namely, the classic benchmark KDD99 dataset [28], the real modern UNSW-NB15 dataset [29] and the industrial intrusion detection dataset from the TE process [30]. These datasets include both host-based and network-based attacks from different platforms, which can demonstrate the effectiveness of the proposed method. The proposed approach is evaluated and compared with other literature methods using several classification evaluation metrics. The experimental results show that the proposed approach is superior to other methods in terms of accuracy (Acc), mean accuracy (MAcc), mean F-score (MF ) and attack accuracy (AAcc) evaluation metrics. Furthermore, the proposed approach outperforms the CPSO-SVM method in terms of all the overall evaluation metrics while achieving a higher computational efficiency with less training and testing time.

The rest of this paper is organized as follows. In Section 2, the extreme learning machine with hybrid kernel function (HKELM) approach is proposed, and DEGSA is presented to optimize the parameters of the HKELM model, and moreover, the KPCA algorithm is introduced for feature extraction. Section 3 outlines the implementation of the proposed algorithms. The experiment environment and model evaluation metrics are illustrated in Section 4. In Section 5, the experimental results are provided to validate the accuracy and efficiency of the proposed approach. Finally, Section 6 contains some concluding remarks. 2. Approach description 2.1. Extreme learning machine with hybrid kernel function (HKELM) 2.1.1. Extreme learning machine (ELM) An ELM is an effective feedforward neural network with a single hidden layer, as presented by Huang et al. [31,32]. Traditional neural networks need to set a large number of parameters to train the network, and moreover, it is easy to generate local optimal solutions. Nevertheless, the ELM only needs to set the number of hidden nodes in the network, without adjusting the weight of the input layer and the bias of the hidden layer, and it is easier to generate a global optimal solution [33]. Therefore, the ELM has a faster convergence rate and is more efficient in terms of the learning performance. The network structure of the ELM is shown in Fig. 1. For the given training dataset T0 = {(xj , t j ), j = 1, . . . , N }, where xj = [xj1 , . . . , xjn ] ∈ Rn is the input feature vector and tar j = [tarj1 , . . . , tarjm ] ∈ Rm is the corresponding target vector, the goal is to obtain the optimal model for further testing tasks. In Fig. 1, y j = [yj1 , . . . , yjm ] ∈ Rm is the output vector obtained via the ELM network. Then, the ELM model can be expressed by the following formula: yj =

l ∑ i=1

βi gi (xj ) =

l ∑

βi g(αi · xj + ci ),

j = 1, . . . , N

(1)

i=1

Please cite this article as: L. Lv, W. Wang, Z. Zhang et al., A novel intrusion detection system based on an optimal hybrid kernel extreme learning machine, Knowledge-Based Systems (2020) 105648, https://doi.org/10.1016/j.knosys.2020.105648.

L. Lv, W. Wang, Z. Zhang et al. / Knowledge-Based Systems xxx (xxxx) xxx

3

where H + is the Moore–Penrose generalized inverse of matrix H and H + can be obtained as follows, H + = H T (H H T )−1

(7)

2.1.2. HKELM approach The prediction accuracy of the ELM model may be relatively low when the model is applied to several unknown testing datasets. As a result, the kernel parameter I /C was introduced to H H T by Huang et al. for improving the generalization ability of the ELM model, namely, the kernel extreme learning machine (KELM). The output function of the KELM can be expressed as follows [34], f (x) = h(x)β = h(x)H T (

I C

+ H H T )−1 T

(8)

where the positive constant C is the penalty parameter and I is the identity matrix. The kernel function of the KELM is defined as follows,

ΩKELM = H H T , ΩKELMi,j = h(xi )h(xj ) = K (xi , xj )

Fig. 1. Network structure of the ELM model.

(9)

Therefore, the model function of the KELM can be written as: where αi is the weight vector between the input layer and hidden layer, βi is the weight vector between the hidden layer and output layer, ci is the bias of the ith hidden node, and g(·) is the activation function of the hidden layer. The node parameters αi and ci of the hidden layer are randomly assigned, and as a consequence, only the number of hidden layer nodes l needs to be determined in the ELM model. If the error between the output y and the target tar can be approximated to zero, then the following equation can be obtained as: N ∑

∥tar j − y j ∥ = 0

(2)

j=1

Combining Eq. (1) with Eq. (2), there exist βi , αi and ci that satisfy: l ∑

βi g(αi · xj + ci ) = tar j ,

j = 1, . . . , N

(3)

i=1

Eq. (3) can be converted into matrix form as follows,

⎡ ⎢ ⎣

g(α1 · x1 + c1 )

.. .

g(α1 · xN + c1 )



⎤ ⎡ · · · g(αl · x1 + cl ) ⎥ ⎢ .. .. ⎦ ·⎣ . . · · · g(αl · xN + cl ) N ×l   

H N ×l =[h(x1 ),...,h(xN )]T

⎡ ⎢ =⎣

tar T1

.. .

tar TN







⎤ βT1 .. ⎥ . ⎦ βTl 

βl×m

l×m

 (4)

⎥ ⎦

T N ×m

N ×m



(5)

where T is the output matrix of the target and H and β are the output matrix and weight matrix of the hidden layer, respectively. Therefore, the weight matrix of hidden layer β can be calculated by the following equation:

β = H +T

f (x) = ⎣



K (x, x1 )



( )−1 ⎥ I .. T + Ω ⎦ KELM . C K (x, xN )

(10)

The selection of the kernel function can greatly influence the performance of the KELM model. Consequently, it is significant to find an appropriate kernel function for the KELM model. The polynomial kernel function and radial basis function (RBF) kernel function are two common kernel functions, which are combined together as the hybrid function of the KELM in the current work. 2.1.2.1. Polynomial kernel function. The expression of the polynomial kernel function is stated as follows, Kpoly (x, xi ) = (x · xi + b)p

(11)

where b and p are the constant and exponent parameters of the polynomial kernel function, respectively. The polynomial kernel function is a typical global kernel function, which means that its corresponding KELM model possesses a strong generalization ability and weak learning ability [35,36]. Fig. 2 demonstrates the curves of the polynomial kernel function with different b and p, where the test point is selected as xi = 0.2. In Fig. 2(a), the value of parameter p is set as p = 2, while the value of parameter b changes from 0.2 to 1.0. In Fig. 2(b), the value of parameter b is set as b = 1, while the value of p changes from 1 to 5. As Fig. 2 indicates, the output of the polynomial kernel function increases with the input. Furthermore, the sample points both near and far away from the test point have an influence on the output of the kernel function, which verifies the strong generalization ability of the polynomial kernel function. However, the test point has no apparent learning capability, revealing the weak learning ability of the polynomial kernel function. 2.1.2.2. RBF kernel function. The expression of the RBF kernel function is indicated as follows,

that is, Hβ = T



(6)

∥x − xi ∥2 KRBF (x, xi ) = exp − 2σ 2

(

)

( = exp −

∥x − xi ∥2 a

) (12)

where a = 2σ 2 is the exponent parameter of the RBF kernel function. The RBF kernel function is a typical local kernel function, which means that the corresponding KELM model has a strong learning ability and weak generalization ability [35,36]. Fig. 3

Please cite this article as: L. Lv, W. Wang, Z. Zhang et al., A novel intrusion detection system based on an optimal hybrid kernel extreme learning machine, Knowledge-Based Systems (2020) 105648, https://doi.org/10.1016/j.knosys.2020.105648.

4

L. Lv, W. Wang, Z. Zhang et al. / Knowledge-Based Systems xxx (xxxx) xxx

demonstrates the curves of the RBF kernel function with different a. In Fig. 3, the value of a changes from 0.02 to 0.50, which determines the width of the RBF kernel function. Unlike the polynomial kernel function, only the sample points near the test point can affect the output of the RBF kernel function, which indicates the poor generalization ability of the RBF kernel function. As seen from Fig. 3, the closer the sample point is to the test point, the stronger the learning ability will be. Compared with Fig. 2, the learning ability of the RBF kernel function is superior to that of polynomial kernel function. Fig. 3 also shows that as the value of a decreases, the learning ability of the RBF kernel function increases while the generalization ability decreases. 2.1.2.3. Hybrid kernel function. The generalization ability of the polynomial kernel function is superior to that of the RBF kernel function, while the learning ability is poorer. Therefore, to improve both the generalization and learning ability of the KELM, a combination of the two kernel functions is proposed as the hybrid function, which takes advantage of both the kernels. In the current work, the linear weight method is utilized, and the equation of the new hybrid function is expressed as follows, Khybrid (x, xi ) = w · KRBF + (1 − w ) · Kpoly , w ∈ [0, 1]

(13)

that is,

(

Khybrid (x, xi ) = w · exp −

∥x − xi ∥2 a

)

+ (1 − w) · (x · xi + b)p ,

w ∈ [0, 1] (14) where the constant w is the weight coefficient of the hybrid function Khybrid . Proposition 1. As the RBF kernel function KRBF and polynomial kernel function Kpoly satisfy Mercer’s theorem, the proposed function Khybrid is also a kernel function satisfying Mercer’s theorem. Proof. KRBF and Kpoly are kernel functions, thus, KRBF and Kpoly are positive semidefinite matrices, that is, for any vector λ ∈ R, the following conditions must be satisfied:

{ Fig. 2. Curves of the polynomial kernel function with different b and p.

λT KRBF λ ≥ 0 λT Kpoly λ ≥ 0

(15)

Then, the following expressions can be obtained:

{

wλT KRBF λ ≥ 0 ⇒ (1 − w )λT Kpoly λ ≥ 0

{

λT (w KRBF )λ ≥ 0 λT [(1 − w)Kpoly ]λ ≥ 0

(16)

Therefore, w KRBF and (1 − w )Kpoly are positive semidefinite matrices. According to Eq. (13), λT Khybrid λ can be rewritten as follows,

λT Khybrid λ = λT [w · KRBF + (1 − w) · Kpoly ]λ = λT (w · KRBF )λ + λT [(1 − w) · Kpoly ]λ       ≥0

(17)

≥0

Combined with Eqs. (16) and (17), the following expression can be obtained:

λT Khybrid λ ≥ 0

Fig. 3. Curves of the RBF kernel function with different a.

(18)

It can be observed from Eq. (18) that Khybrid is a positive semidefinite matrix, that is, Khybrid satisfies Mercer’s theorem; thus, the proposed function Khybrid is a kernel function, which can be applied to improve the performance of the KELM approach. The proof is completed.

Please cite this article as: L. Lv, W. Wang, Z. Zhang et al., A novel intrusion detection system based on an optimal hybrid kernel extreme learning machine, Knowledge-Based Systems (2020) 105648, https://doi.org/10.1016/j.knosys.2020.105648.

L. Lv, W. Wang, Z. Zhang et al. / Knowledge-Based Systems xxx (xxxx) xxx

5

Step 2. Mutation. The mutation operation is utilized to generate a mutant vector, V i (t) = [vi,1 (t), . . . , vi,D (t)], and the mutation rule is executed by the following equation: V i (t + 1) = X r1 (t) + F × (X r2 (t) − X r3 (t)), 1 ≤ i ̸ = r1 ̸ = r2 ̸ = r3 (23) where t denotes the tth generation and F is the scaling parameter of two vectors. Step 3. Crossover. A trial vector, U i (t) = [ui,1 (t), . . . , ui,D (t)], is created through the crossover operation. The corresponding process can be descried as follows,

{ ui,j (t + 1) =

{ X i (t + 1) = Fig. 4 illustrates the curves of the hybrid kernel function with different w . The values of parameters a, b and p are set as a = 0.18, b = 1 and p = 2, respectively, while the value of parameter w changes from 0 to 1. It is observed from Fig. 4 that the larger the value of w is, the stronger the hybrid kernel function learning ability, while the corresponding generalization ability will be weakened. Moreover, all the parameters of Khybrid can affect the performance of the HKELM model. Therefore, it is significant to determine the optimal parameters aopt , bopt , popt and wopt of the hybrid kernel function, which achieves both the great learning and generalization abilities of the resulting HKELM model. 2.2. Differential evolution combined with gravitational search algorithm (DEGSA) 2.2.1. Differential evolution algorithm The differential evolution (DE) algorithm is a particle-based global optimization algorithm, which was proposed by Storn and Price in 1997 [37]. Compared to evolutionary algorithms, the DE algorithm exhibits a strong global search capacity and robustness [38,39]. The optimization problem within a D-dimensional space is shown as follows, (19)

subject to, X Li

≤ Xi ≤

,

i = 1, . . . , N

≤ xi,j (0) ≤

xLi,j

, i = 1, . . . , N ; j = 1, . . . , D

(21)

where xi,j (0) is the initial population, which should cover the entire space. Then, the initial value is generated by xi,j (0) = xLi,j + rand(0, 1) · (xUi,j − xLi,j ) where rand(0, 1) is a random number between 0 and 1.

(22)

U i (t), X i (t),

if fit(U i (t)) ≤ fit(X i (t)) other w ise

  n 1 ∑ fit(x) = √ (yi − yˆ i )2 n

(25)

(26)

i=1

where yi and yˆ i are the measured and predicted results, respectively. 2.2.2. Gravitational search algorithm Gravitational search algorithm (GSA) was proposed by Esmat et al. in 2009 [40]. In GSA, the agents are regarded as objects, and all objects tend to move towards the objects with heavy masses. This algorithm realizes the communication between objects through gravitational force, thus, guiding all the objects to the optimal solution in the search space [41,42]. Suppose the number of objects is N, the position of the ith object can be defined as follows, xi = (x1i , . . . , xdi , . . . , xDi ),

i = 1, . . . , N

(27)

where xdi denotes the position of the ith object in the dth dimension. The force acting on the ith object from the jth object is described as,

(20)

where X i = [xi,1 , . . . , xi,D ], i = 1, . . . , N is a candidate solution and X Li and X Ui denote the minimum and maximum of X i , respectively. Then, the DE algorithm can be described as follows, Step 1. Initialization. The boundary constraints of the search space are given as follows, xLi,j

(24)

where fit(·) is a fitness value, which is calculated by the following equation:

Fijd = G(t) X Ui

if rand(0, 1) ≤ CR other w ise

where CR is the crossover rate, which can assist the model in avoiding local optimization and maintains the diversity of the population [26]. Step 4. Selection. The fitness of U i (t) is compared with that of X i (t), and the better result is selected by the DE algorithm. In other words, if the fitness of the population after crossover increases, then the results are updated, that is,

Fig. 4. Curves of the hybrid kernel function with different w.

min f (X 1 , X 2 , . . . , X N )

vi,j (t + 1), xi,j (t),

Mpi (t)Maj (t) Bij (t) + ε

(xdj (t) − xdi (t))

(28)

where Mpi (t) and Maj (t) are the active gravitational masses related to the ith object and jth object, respectively. Bij (t) is the Euclidian distance between ith object and jth object. ε is a small constant. G(t) is the gravitational constant, namely,

( G(t) = G0 exp −γ

t

)

itermax

(29)

where γ is the descending coefficient, G0 is the initial gravitational constant, and itermax is the maximum iteration. In GSA, the total force that acts on the ith object is calculated as, Fid (t) =

NP ∑

randj Fijd (t)

(30)

j∈Kbest ,j̸ =i

Please cite this article as: L. Lv, W. Wang, Z. Zhang et al., A novel intrusion detection system based on an optimal hybrid kernel extreme learning machine, Knowledge-Based Systems (2020) 105648, https://doi.org/10.1016/j.knosys.2020.105648.

6

L. Lv, W. Wang, Z. Zhang et al. / Knowledge-Based Systems xxx (xxxx) xxx

where randj is a random number between 0 and 1 and Kbest are the first k objects with the best fitness values. By the law of motion, the acceleration of the ith object is shown as follows, acid (t)

Fid (t)

=

Mi (t)

(31)

where Mi (t) is the inertial mass of the ith object. The next velocity of an object is the sum of the current velocity and its acceleration. Therefore, the updates in velocity and position of the ith object can be described as,

vid (t + 1) = randi × vid (t) + acid (t)

(32)

xdi (t + 1) = xdi (t) + vid (t + 1)

(33)

Based on the above optimization procedures of DEGSA, a detailed description of the proposed algorithm is summarized as follows, (1) Initialize the number of objects N and the velocity xi and position vi of the ith object. Determine the parameters of DEGSA, e.g., the descending coefficient γ , the initial gravitational constant G0 , etc. Set the maximal number of iterations itermax and let the initial iteration be t = 1. (2) Calculate the fitness value according to Eq. (26). (3) If the number of the iteration t is odd, go to step (4); otherwise, go to step (5). (4) Activate GSA. (a) Calculate the gravitational constant G according to Eq. (29). (b) For the minimum problem, calculate the best and worst fitness values best(t) and w orst(t) according to Eqs. (37)∼ (38). (c) Calculate the total force of the ith object Fid (t) according to Eq. (30), obtain the acceleration of the ith object acid (t) according to Eq. (31), and calculate the inertial mass of the ith object Mi (t) according to Eqs. (35)∼ (36). (d) Update the velocity and position of the ith object according to Eqs. (32)∼ (33). (e) If i ≤ N, go back to step(4)-(c); else, go to step (6).

Assuming that the gravitational mass equals to the inertial mass, namely, Mai = Mpi = Mii = Mi , i = 1, . . . , N

(34)

then the updates in gravitational and inertial masses are indicated as follows, mi (t) =

fiti (t) − w orst(t) best(t) − w orst(t)

mi (t) Mi (t) = ∑N j=1 mj (t)

(35)

(36)

(5) Activate the DE algorithm. (a) Generate a mutant vector Vi (t) according to Eq. (23). (b) Generate a trial vector Ui (t) according to Eq. (24). (c) If fit(Ui (t)) ≤ fit(Xi (t)), go to step (5)-(d); otherwise, go back to step (5)-(a). (d) If i ≤ N, go back to step(5)-(a); else, go to step (6).

where fiti (t) is a fitness value. For a minimum problem, best(t) and w orst(t) are defined by the following equations: best(t) =

min fitj (t)

j∈1,...,NP

worst(t) = max fitj (t) j∈1,...,NP

(37)

(38)

For a maximum problem, best(t) and w orst(t) are defined as, best(t) = max fitj (t)

(39)

worst(t) = min fitj (t)

(40)

j∈1,...,NP

(6) Update the parameters according to the new fitness. (7) If t ≤ itermax , go back to step (3); otherwise, go to step (8). (8) Output the updated solutions as the optimal parameters. Then, the proposed DEGSA is finished. 2.3. Kernel principal component analysis (KPCA)

j∈1,...,NP

In the current work, the intrusion detection problem is regarded as a minimum problem, that is, Eqs. (37) and (38) are used for the intrusion detection problem. 2.2.3. The proposed DEGSA The global optimization ability of the DE algorithm is strong, which can accurately find the global optima of the search space with the differential information [43]. Nevertheless, the local optimization ability of the DE algorithm is relatively weak. On the contrary, the local optimization ability of GSA is strong, while its global optimization ability is relatively weak. As the iteration progresses, GSA requires more time to reach the optimal solution due to the emergence of a large number of inertial mass objects. Therefore, GSA and the DE algorithm are combined to improve both the local and global optimization abilities of the proposed DEGSA. In current work, GSA and the DE algorithm will be performed alternately, i.e., GSA is implemented at the odd generations, while the DE algorithm is applied at the even generations. The introduction of the DE algorithm increases the diversity of the original GSA, which assists DEGSA to explore the search space smartly while the individual approaches avoid becoming stuck at the local optima [43,44]. The flowchart of DEGSA is given graphically in Fig. 5.

Principal component analysis (PCA) is a classic approach utilized to feature extraction and dimensionality reduction [45]. PCA can deal well with the linear relationship between variables; however, when dealing with nonlinear relations, the contribution rate of each principal component is too scattered. As a result, the comprehensive variables that can effectively represent the original samples cannot be found precisely, leading to the low effectiveness of the PCA method. The KPCA approach was proposed by Scholkopf et al. [46] as an improvement to the original PCA method and can deal with nonlinear problems effectively. KPCA maps the nonlinear training samples X = [x1 , . . . , xn ]T ∈ Rn×d into a high-dimensional feature space Γ through the nonlinear mapping function Φ [47], namely,

Φ:

{

Rn×d → Γ X → Φ (X )

(41)

The inseparable data in the input space becomes separable in the high-dimensional feature space Γ by using the simple nonlinear mapping function Φ , then the PCA method is utilized to extract the features in Γ , which realizes the separation of nonlinear training samples. The ith feature after the transformation of KPCA can be expressed as follows, 1 qi = √ σiT [k(x1 , xnew ), . . . , k(xn , xnew )]T ,

µi

i = 1, . . . , p

(42)

Please cite this article as: L. Lv, W. Wang, Z. Zhang et al., A novel intrusion detection system based on an optimal hybrid kernel extreme learning machine, Knowledge-Based Systems (2020) 105648, https://doi.org/10.1016/j.knosys.2020.105648.

L. Lv, W. Wang, Z. Zhang et al. / Knowledge-Based Systems xxx (xxxx) xxx

7

Fig. 5. Flowchart of DEGSA.

where µi , i = 1, . . . , p are the p largest positive eigenvalues satisfying µ1 ≥ µ2 ≥ · · · ≥ µp and σi , i = 1, . . . , p are the corresponding eigenvectors. k(xn , xnew ) is the kernel function of xn and a new column vector sample xnew , which calculates the inner products of xn and xnew in the high-dimensional feature space Γ .

Table 1 The definitions of evaluation metrics for the multiclass ID problem. Name

Meaning

Notation

Precision

The instances of correctly classified as the ith class given the proportion of all instances predicted as the ith class The instances of correctly classified as the ith class given the proportion of all instances actually belonging to the ith class The balance between the precision and the recall The frequency of correct decisions The average recall among all the classes of the dataset The average F-score among all the classes of the dataset The accuracy rate for the attack classes The frequency of falsely predicting a normal instance as an attack The frequency of falsely predicting an attack instance as normal

Pi (i = 0, 1, . . . , c − 1)

3. Algorithm outline

Recall

The ID process based on the KPCA-DEGSA-HKELM approach is demonstrated in Fig. 6, which is also described in detail as follows,

F-score Accuracy

Algorithm. The proposed KPCA-DEGSA-HKELM approach Step 1: Input: ID dataset, the initial parameters of the KPCADEGSA-HKELM model. Step 2: Dimensionality reduction and feature extraction of the input dataset by using the KPCA method. Step 3: Train the HKELM model with the preprocessed training dataset and optimize the model parameters of the HKELM with the hybrid algorithm DEGSA. When the number of iterations reaches the maximum value iter max , the optimization process stops, and the optimal parameters are obtained. Step 4: Evaluate the obtained optimal HKELM model with the testing dataset. Step 5: Output: the evaluation metrics of the ID problem. 4. Environment and model evaluation All the experiments in the current work are executed on a PC constituted by a 2.6 GHz Intel Core i5 processor with 8.0 GB of RAM. To validate the effectiveness of the KPCA-DEGSA-HKELM classification model, several multiclass classification evaluation metrics are utilized to gauge the models. In current work, the precision, recall and F-score are used as the evaluation metrics of each class, moreover, the accuracy, mean accuracy, mean Fscore, attack accuracy, false alarm rate and false normal rate are utilized as the overall evaluation metrics. The detailed definitions of the evaluation metrics are provided in Table 1. In Table 1,

Mean accuracy Mean F-score Attack accuracy False attack accuracy False normal rate

Ri (i = 0, 1, . . . , c − 1)

Fi (i = 0, 1, . . . , c − 1) Acc MAcc MF AAcc FAR

FNR

the different categories of the network connection in the experiments are denoted as normal (labeled as 0), attacks (labeled as 1, 2, . . . , c − 1), where c denotes the number of categories for the network connection. Take the KDD99 dataset as an example, which contains four attack categories of DoS, PRB, U2R and R2L. A confusion matrix of a classification experiment is shown in Table 2, where Nij represents the number of the ith kind of network connection that is predicted as the jth kind of network connection. Then, all the evaluation metrics can be calculated as follows, Pi =

TPi TPi + FPi

Nii

= ∑c −1 j=0

Nji

(i = 0, 1, . . . , c − 1)

(43)

Please cite this article as: L. Lv, W. Wang, Z. Zhang et al., A novel intrusion detection system based on an optimal hybrid kernel extreme learning machine, Knowledge-Based Systems (2020) 105648, https://doi.org/10.1016/j.knosys.2020.105648.

8

L. Lv, W. Wang, Z. Zhang et al. / Knowledge-Based Systems xxx (xxxx) xxx

Fig. 6. ID process based on the KPCA-DEGSA-HKELM approach. Table 2 An example of a confusion matrix. Classified class

Actual class Normal (0) DoS (1) PRB (2) U2R (3) R2L (4)

Ri =

Fi =

TPi

Normal (0)

DoS (1)

PRB (2)

U2R (3)

R2L (4)

N00 N10 N20 N30 N40

N01 N11 N21 N31 N41

N02 N12 N22 N32 N42

N03 N13 N23 N33 N43

N04 N14 N24 N34 N44

Nii

= ∑c −1

TPi + FNi 2 · R i · Pi

j=0

MAcc =

MF =

AAcc =

FAR =

Fig. 7. F-scores and mean F-scores for the Poly_KELM, RBF_KELM and HKELM methods with the KDD99 dataset.

5. Case study (45)

∑c −1 0 Nii = ∑c −1 i= ∑c −1

TPi

(TPi + FNi )

i=0

c −1 1∑

c

i=0

c −1 1∑

c

(44)

5.1. Case 1: Intrusion detection of the KDD99 dataset

∑c −1 i=0

Nij

(i = 0, 1, . . . , c − 1)

R i + Pi

Acc = ∑c −1

(i = 0, 1, . . . , c − 1)

i=0

TPi TPi + FNi

=

j=0

c −1 1∑

c

(46) Nij

Ri

(47)

i=0

Fi

(48)

i=0

c −1 ∑

1 c−1

i=1

FN1 TP1 + FN1

TPi TPi + FNi

FP0 +

c −1 ∑

1 c−1

Ri

(49)

i=1

= 1 − R1

FP0

FNR =

=

∑c −1 i=1

TPi

(50)

= ∑c −1 j=1

∑c −1

Nj0

Nj0 +

∑c −1

j=1

i=1

(51) Nii

where c = 5, 10 and 5 denote the number of categories for the KDD99 dataset, UNSW-NB15 dataset and TE dataset, respectively; true positive (TP) is the number of correctly classified instances; false positive (FP) is the number instances with an actual class other than the ith but incorrectly classified as the ith class; false negative (FN) is the number of instances with ith being the actual class but incorrectly classified as another class.

5.1.1. Dataset description The KDD99 dataset is selected as a standard benchmark database to evaluate the effectiveness of the proposed models. In the KDD99 dataset, each instance consists of 41 features and a label, where the label belongs to either a normal or specific attack type. As Table 3 shows, there are 23 types of attacks in total, which can be divided into four major attack categories [28]: DoS (denial of service), PRB (probing), U2R (user to root) and R2L (remote to local). As the full dataset of KDD99 (18 M; 743 M uncompressed) is cumbersome for training machine learning algorithms, the 10% subset of the dataset (2.1 M; 75 M uncompressed) is used by the majority of researchers. Thus, the subset that maintains the initial characteristics of the full dataset is chosen to be the experiment dataset in current work. Table 4 provides the detailed information of the instances in the datasets, where the training and two testing datasets are denoted as T0 , T1 and T2 , respectively. The training dataset T0 together with the testing data set T1 are used to demonstrate the effectiveness of the proposed HKELM, DEGSA-HKELM and KPCA-DEGSA-HKELM models. Furthermore, to be consistent with other literature works such as KDD99 winner [48], another testing dataset T2 is applied to compare the performance of the proposed KPCA-DEGSA-HKELM model with the models of other research. 5.1.2. Experimental results and discussion Firstly, the training dataset T0 and testing dataset T1 are used to demonstrate the performance of the proposed HKELM, DEGSAHKELM and KPCA-DEGSA- HKELM models. (1) HKELM When the model parameters are set as a = 100, b = 15, p = 4, and w = 0.9, the confusion matrices of the Poly_KELM,

Please cite this article as: L. Lv, W. Wang, Z. Zhang et al., A novel intrusion detection system based on an optimal hybrid kernel extreme learning machine, Knowledge-Based Systems (2020) 105648, https://doi.org/10.1016/j.knosys.2020.105648.

L. Lv, W. Wang, Z. Zhang et al. / Knowledge-Based Systems xxx (xxxx) xxx

9

Table 3 Attack categories of the KDD99 dataset. Class

Meaning

Attacks of KDD99

DoS PRB U2R R2L

Denial of service Surveillance and other means of probing Unauthorized access to local (root) privileges Unauthorized access from a remote machine

back, land, neptune, pod, smurf, teardrop ipsweep, nmap, portsweep, satan buffer_overflow, loadmodule, perl, rootkit ftp_write, guess_passwd, imap, multihop, phf, spy, warezclient, warezmaster

Table 4 The instances of training and testing datasets for the KDD99 dataset. Number of instances

Class Normal DoS PRB U2R R2L

Training data set T0

Testing dataset T1

Testing dataset T2

10% KDD99 dataset

200 60 40 30 60

1000 500 500 52 1000

10000 40000 400 52 100

97278 391458 4107 52 1126

Table 5 Confusion matrices of the KDD99 dataset T1 .

Table 6 Comparison of the overall evaluation metrics with the KDD99 dataset T 1 .

(a) By the Poly_KELM method Actual class Normal DoS PRB U2R R2L Precision (%)

Method

Classified class Normal

DoS

PRB

U2R

R2L

Recall (%)

940 1 48 6 67 89.27

0 497 0 0 1 99.80

0 2 411 0 0 99.54

31 0 16 41 17 48.42

29 0 25 5 915 96.00

94.00 99.40 82.20 78.85 91.50

(b) By the RBF_KELM method Actual class Normal DoS PRB U2R R2L Precision (%)

Classified class Normal

DoS

PRB

U2R

R2L

Recall (%)

978 3 55 10 64 88.11

0 463 15 0 7 95.46

2 34 427 0 1 92.03

4 0 0 33 7 75.00

16 0 3 9 921 97.05

97.80 92.60 85.40 63.46 92.10

(c) By the HKELM method Actual class Normal DoS PRB U2R R2L Precision (%)

Classified class Normal

DoS

PRB

U2R

R2L

Recall (%)

965 2 32 4 78 89.27

0 496 0 0 1 99.80

0 2 429 0 0 99.54

23 0 15 46 11 48.42

12 0 24 2 910 95.99

96.50 99.20 85.80 88.46 91.00

(a) The parameters are set as b = 15, and p = 4. (b) The parameter is set as a = 100. (c) The parameters are set as a = 100, b = 15, p = 4, and w = 0.9.

RBF_KELM and proposed HKELM methods are given in Table 5(a)∼(c). To demonstrate the performance of Poly_KELM, RBF_KELM and HKELM, a more visual illustration is shown in Fig. 7, which gives the comparisons among the three methods as the F-score and mean F-score for each class. In Fig. 7, the F-score of normal, PRB and mean F-score obtained by the HKELM method are higher than those of the other two methods. In addition, the overall evaluation metrics of the three methods are shown in Table 6. It is clear that the HKELM method is better than the other two methods for all the overall evaluation metrics. For the proposed HKELM method, the appropriate values of the hybrid kernel function parameters a, b, p and w in Eq. (14) need to be selected. By utilizing the HKELM method that is constructed upon the training dataset T0 and testing dataset T1 ,

Poly_KELM RBF_KELM HKELM

Evaluation metric Acc (%)

MAcc (%)

MF (%)

AAcc (%)

FNR (%)

91.87 92.46 93.25

89.19 86.27 92.19

85.15 87.71 88.08

87.99 83.39 91.12

6.14 6.68 5.81

the testing results with different values of a, b, p and w are shown in Fig. 8(a)∼(d). As shown in Fig. 8, the values of these four parameters will affect the accuracy of the classification, therefore, it is significant to determine the appropriate set of the four parameter values. (2) DEGSA-HKELM It is difficult to quickly and precisely find the optimal parameter values via human experience. As a result, DEGSA integrated with a swarm intelligent optimization algorithm is introduced to adaptively obtain the optimal parameter values in the current work. In the meanwhile, the DE-HKELM and GSA-HKELM methods are considered to compare with the DEGSA-HKELM method proposed in the paper. In current work, the internal parameters of the DE branch of the DEGSA-HKELM method are consistent with those of the DE-HKELM method, which are set as: the population size is NDE = 10, the maximum number of iterations is iter max = 20; the lower bound of the scaling parameter F in Eq. (23) is FL = 0.2, the upper bound of the scaling parameter F in Eq. (23) is FU = 0.8, and the crossover rate in Eq. (24) is CR = 0.2. In addition, the internal parameters of the GSA branch of the DEGSA-HKELM method are consistent with those of the GSAHKELM method, which are set as: the number of agents is NGSA = 20, the maximum number of iterations is iter max = 20, the small constant in Eq. (28) is ε = 2−52 , the initial gravitational constant in Eq. (29) is G0 = 300, and the descending coefficient in Eq. (29) is γ = 20. Table 7 shows the detailed F-score of each class obtained by DE-HKELM, GSA-HKELM and DEGSA-HKELM. From Table 7, the F-score of each class obtained by DEGSA-HKELM is higher than that of the other two methods. In particular, the F-score of U2R obtained by DE-HKELM is 77.78%, GSA-HKELM is 81.48% and DEGSA-HKELM is 85.46%. DEGSA-HKELM improves of the F-score of U2R by approximately 7.68% and 3.98% compared to that of DE-HKELM and GSA-HKELM, respectively. Furthermore, the mean F-score obtained by DE-HKELM is 92.77%, GSA-HKELM is 93.61% and DEGSA-HKELM is 94.86%. DEGSA-HKELM achieves percentage increases of approximately 2.09% and 1.25% for the mean Fscore, when compared to that of DE-HKELM and GSA-HKELM, respectively.

Please cite this article as: L. Lv, W. Wang, Z. Zhang et al., A novel intrusion detection system based on an optimal hybrid kernel extreme learning machine, Knowledge-Based Systems (2020) 105648, https://doi.org/10.1016/j.knosys.2020.105648.

10

L. Lv, W. Wang, Z. Zhang et al. / Knowledge-Based Systems xxx (xxxx) xxx

Fig. 8. Testing results for the KDD99 dataset with different values of a, b, p, and w.

Please cite this article as: L. Lv, W. Wang, Z. Zhang et al., A novel intrusion detection system based on an optimal hybrid kernel extreme learning machine, Knowledge-Based Systems (2020) 105648, https://doi.org/10.1016/j.knosys.2020.105648.

L. Lv, W. Wang, Z. Zhang et al. / Knowledge-Based Systems xxx (xxxx) xxx Table 7 The detailed F-score of each class obtained by DE-HKELM, GSA-HKELM and DEGSA-HKELM with the KDD99 dataset T1 . Class

DE-HKELM (%)

GSA-HKELM (%)

DEGSA-HKELM (%)

Normal DoS PRB U2R R2L

94.52 99.20 97.24 77.78 95.11

94.70 99.20 97.24 81.48 95.43

95.52 99.40 97.45 85.46 96.50

Mean

92.77

93.61

94.86

Table 8 Comparison of the overall evaluation metrics for the KDD99 dataset T1 . Method DE-HKELM GSA-HKELM DEGSA-HKELM

Evaluation metric Acc (%)

MAcc (%)

MF (%)

AAcc (%)

FNR (%)

95.61 95.84 96.59

93.09 93.96 95.58

92.77 93.61 94.86

91.59 92.68 94.70

5.21 5.01 4.12

Fig. 9. F-scores and mean F-scores for DE-HKELM, GSA-HKELM and DEGSAHKELM with the KDD99 dataset.

In addition, to indicate the performance of DEGSA more intuitively, the F-score and mean F-score of DE-HKELM, GSA-HKELM and DEGSA-HKELM are shown in Fig. 9. The column bars containing slashes, horizontal lines and pure black present the F-scores obtained by DE-HKELM, GSA-HKELM and DEGSA-HKELM, respectively. It is evident that the F-scores for the five classes and the overall mean F-score obtained by DEGSA-HKELM are higher than those of the other two methods, especially the F-score of the U2R class. Table 8 gives the comparison of the overall evaluation metrics for DE-HKELM, GSA-HKELM and DEGSA-HKELM. The accuracy, mean accuracy, mean F-score and attack accuracy of DEGSAHKELM are higher than those of DE-HKELM and GSA-HKELM, while the false normal rate of DEGSA-HKELM is lower than that of the other two methods. These results show that the hybrid optimization algorithm DEGSA is superior to DE and GSA in determining the optimal parameters to improve the performance of the HKELM method. (3) KPCA-DEGSA-HKELM The KPCA algorithm is introduced to reduce the impact of meaningless or less important features on the classification results and computational efficiency. The comparisons between DEGSA-HKELM and KPCA-DEGSA-HKELM are shown in Table 9. In Table 9, the accuracy, mean accuracy and attack accuracy of KPCA-DEGSA-HKELM are 96.69%, 95.66% and 94.85%, which are higher than those of DEGSA-HKELM with 96.59%, 95.58% and 94.70%, respectively. In addition, the false normal rate means the probability that the actual attack records are misclassified as the normal records, which has significant meaning in ID systems. The false normal rate of KPCA-DEGSA-HKELM is 3.73%, which achieves a percentage decrease of 9.47% compared to that of DEGSA-HKELM with 4.12%. Furthermore, the testing time of DEGSA-HKELM is 0.033581s, and that of KPCA-DEGSA-HKELM is

11

0.012569 s. In detail, KPCA-DEGSA-HKELM achieves a percentage decrease of 62.57% in time compared to that of DEGSAHKELM, which indicates the great computational efficiency of the proposed KPCA-DEGSA-HKELM approach. (4) Comparisons and discussion The proposed KPCA-DEGSA-HKELM approach is also tested with the testing dataset T2 , and the results of which are comparable to those of other literature works. Table 10 gives the comparison among the solutions achieved by the proposed approach and those of the literature methods. As shown in Table 10, the current work has superior precision for DoS and U2R attacks, when compared with that of other methods. Moreover, the accuracy (Acc), mean accuracy (MAcc), mean F-score (MF ) and attack accuracy (AAcc) evaluation metrics of the proposed KPCA-DEGSA-HKELM are 99.00%, 95.38%, 87.21% and 94.47%, respectively, which are higher than those of the other methods. The mean F-score of the current work is 87.21%, KDDwinner [48] is 58.87%, CSVAC [5] is 66.20%, CPSOSVM [18] is 71.28% and Dendron [6] is 85.77%; the mean F-score of the current work is improved by approximately 28.34%, 21.01%, 15.93% and 1.44% over that of KDDwinner, CSVAC, CPSO-SVM and Dendron, respectively. The attack accuracy of the current work is 94.47%, KDDwinner [48] is 83.74%, CSVAC [5] is 69.83%, CPSO-SVM [18] is 92.62% and Dendron [6] is 87.50%; the attack accuracy of the current work is improved by approximately 10.73%, 24.64%, 1.85% and 6.97% compared to that of KDDwinner, CSVAC, CPSO-SVM and Dendron, respectively. As a result, the proposed approach has the ability to improve the performance of the KDD99 problem. In addition, to demonstrate the advantage in computational efficiency of the current work, the results (accuracy, training and testing time) of CPSO-SVM [18] and the proposed KPCADEGSA-HKELM with the same training dataset T0 are shown in Table 11. It is noted that all the results of CPSO-SVM and the current work are obtained on the same computational platform. In Table 11, the accuracy of the current work with testing datasets T1 and T2 are 96.69% and 99.00%, respectively, which are higher than those of the CPSO-SVM method. Furthermore, the training and testing times of CPSO-SVM with the testing dataset T1 are 19.915382 s and 0.041214 s, and those of KPCA-DEGSA-HKELM are 13.204581 s and 0.012569 s. Specifically, KPCA-DEGSA-HKELM achieves savings of 33.70% in training time and 69.50% in testing time when compared to those of CPSO-SVM. In addition, the training and testing times of CPSO-SVM with the testing dataset T2 are 20.047230 s and 0.426238 s, while those of KPCA-DEGSAHKELM are 14.830142 s and 0.168058 s, respectively. In detail, KPCA-DEGSA-HKELM achieves savings of 26.02% in training time and 60.57% in testing time when compared to those of CPSOSVM. The results of Table 11 reveal the time-saving benefits of KPCA-DEGSA-HKELM for the KDD99 dataset. 5.2. Case 2: Intrusion detection with the UNSW-NB15 dataset 5.2.1. Dataset description UNSW-NB15 was proposed by Nour et al. [29] as a modernized dataset reflecting contemporary network traffic characteristics and new low-footprint attack scenarios. This dataset was created at the Australian Centre for Cyber Security (ACCS), utilizing the IXIA tool, which generated a modern representative of real modern normal and synthetic abnormal network traffic in the synthetic environment. The UNSW-NB15 dataset is quite different from the previous KDD99 dataset and reflects a more recent and complex threat environment. The UNSW-NB15 dataset contains approximately 2540044 data instances, and each instance consists of 49 features. As Table 12 shows, this dataset includes 9 types of attack in total.

Please cite this article as: L. Lv, W. Wang, Z. Zhang et al., A novel intrusion detection system based on an optimal hybrid kernel extreme learning machine, Knowledge-Based Systems (2020) 105648, https://doi.org/10.1016/j.knosys.2020.105648.

12

L. Lv, W. Wang, Z. Zhang et al. / Knowledge-Based Systems xxx (xxxx) xxx Table 9 Comparison of the overall evaluation metrics between DEGSA-HKELM and KPCA-DEGSA-HKELM with the KDD99 dataset T1 . Evaluation metric

Method DEGSA-HKELM KPCA-DEGSA-HKELM

Acc (%)

MAcc (%)

MF (%)

AAcc (%)

FAR (%)

FNR (%)

Testing time (s)

96.59 96.69

95.58 95.66

94.86 94.38

94.70 94.85

0.90 1.10

4.12 3.73

0.033581 0.012569

Table 10 Comparison of the solutions achieved by the proposed approach with those of other literature methods with the KDD99 dataset T2 . Precision (%)

Method KDDwinner [48] SVM [49] CSVAC [5] CPSO-SVM [18] RTMAS-AIDS [15] Dendron [6] Current work

Normal

DoS

PRB

U2R

R2L

99.45 99.30 99.91 96.87 97.89 99.36 96.35

97.12 99.50 99.72 99.98 99.79 99.12 99.99

83.32 97.50 65.74 63.61 91.86 82.83 89.57

13.16 19.70 42.59 11.08 24.68 52.63 54.55

8.40 28.80 20.47 50.27 35.90 79.54 68.15

Acc (%)

MAcc (%)

MF (%)

AAcc (%)

FAR (%)

92.71 95.70 94.86 98.05 95.86 98.85 99.00

81.91 N/A 75.26 93.45 N/A 89.85 95.38

58.87 N/A 66.20 71.28 N/A 85.77 87.21

83.74 N/A 69.83 92.62 N/A 87.50 94.47

25.39 0.70 3.04 3.26 2.13 0.75 0.94

Table 11 The results of the CPSO-SVM method and the current work for different KDD99 testing datasets.. Testing dataset

Acc (%)

Training time (s)

Testing time (s)

CPSO-SVM Current work CPSO-SVM Current work Time saved CPSO-SVM Current work Time saved T1 T2

94.35 98.05

96.69 99.00

19.915382 20.047230

13.204581 14.830142

33.70% 26.02%

0.041214 0.426238

0.012569 0.168058

69.50% 60.57%

The results are obtained on the same computational platform. Table 12 Attack categories of the UNSW-NB15 dataset. Class

Meaning

Generic

A technique that works against all block-ciphers (with a given block and key size), without consideration of the structure of the block-cipher The attacker knows of a security problem within an operating system or a piece of software and leverages that knowledge by exploiting the vulnerability Attempting to cause a program or network to suspend by feeding it randomly generated data A malicious attempt to make a server or a network resource unavailable to users, usually by temporarily interrupting or suspending the services of a host connected to the Internet Contains all strikes that can simulate attacks that gather information It contains different attacks, including port scan, spam and html file penetrations A technique in which a system security mechanism is bypassed stealthily to access a computer or its data A small piece of code used as the payload in the exploitation of software vulnerability Attacker replicates itself to spread to other computers. Often, it uses a computer network to spread itself, relying on the security failures of the target computer to access it

Exploits

Fuzzers DoS

Reconnaissance Analysis Backdoors Shellcode Worms

Fig. 10. F-scores and mean F-scores for CPSO-SVM, Dendron and KPCA-DEGSA-HKELM with the UNSW-NB15 dataset.

The UNSW-NB15 dataset has been divided into two subsets, namely, the training set and testing set [29], which contain 175341 and 82332 records, respectively. It is noted that the partitioned dataset has only 43 features with the class label,

removing 6 features from the original dataset. The training set and testing set are used by the majority of researchers; therefore, they are chosen to be the experiment datasets in the current work. Table 13 indicates the detailed information of the datasets,

Please cite this article as: L. Lv, W. Wang, Z. Zhang et al., A novel intrusion detection system based on an optimal hybrid kernel extreme learning machine, Knowledge-Based Systems (2020) 105648, https://doi.org/10.1016/j.knosys.2020.105648.

L. Lv, W. Wang, Z. Zhang et al. / Knowledge-Based Systems xxx (xxxx) xxx

13

Table 13 The instances of training and testing datasets for the UNSW-NB15 dataset. Number of instances

Class

UNSW-NB15 training set Normal Generic Exploits Fuzzers DoS Reconnaissance Analysis Backdoors Shellcode Worms Total

UNSW-NB15 testing set

Training dataset D0

Testing dataset D1

56000 40000 33393 18184 12264 10491 2000 1746 1133 130

37000 18871 11132 6062 4089 3496 677 583 378 44

3420 366 760 485 172 270 45 40 40 22

30786 3291 6849 4353 1546 2433 401 306 338 22

175341

82332

5620

50325

Table 14 Confusion matrix of the UNSW-NB15 dataset D1 . Actual class Norm. Gene. Expl. Fuzz. DoS Reco. Anal. Back. Shell. Worms Precision (%)

Classified class Norm.

Gene.

Expl.

Fuzz.

DoS

Reco.

Anal.

Back.

Shell.

Worms

Recall (%)

30043 44 363 958 121 102 54 30 77 1 94.50

7 3014 15 8 28 6 0 0 0 0 97.92

208 126 5895 82 648 172 69 47 4 12 81.16

310 42 176 3107 141 38 62 68 11 0 78.56

6 30 136 32 395 37 70 56 0 0 51.84

52 14 118 66 98 1976 31 34 31 0 81.65

17 1 9 6 17 6 101 15 0 0 58.72

6 0 12 3 7 0 0 41 0 0 59.42

134 20 121 90 90 96 13 15 215 0 27.08

3 0 4 1 1 0 1 0 0 9 47.37

97.59 91.58 86.07 71.38 25.55 81.22 25.19 13.40 63.61 40.91

Norm.: Normal, Gene.: Generic, Expl.: Exploits, Fuzz.: Fuzzers, Reco.: Reconnaissance, Anal.: Analysis, Back.: Backdoors, Shell.: Shellcode.

where the training and testing datasets are denoted as D0 and D1 , respectively. D0 and D1 are applied to compare the performance of the proposed KPCA-DEGSA-HKELM model with other literature methods. 5.2.2. Experimental results and discussion The confusion matrix derived from the testing process of the proposed KPCA-DEGSA-HKELM approach is given in Table 14. Table 15 shows the detailed F-score of each class obtained by CPSO-SVM [18], Dendron [6] and the proposed KPCA-DEGSAHKELM approach. From Table 15, the F-scores for the classes Normal, Generic, Exploits, Fuzzers, DoS, Reconnaissance, Analysis, Shellcode and Worms achieved by the KPCA-DEGSA-HKELM approach are higher than those of the other two methods, and the F-score of the Backdoors class obtained by the KPCA-DEGSAHKELM approach is higher than that of the CPSO-SVM method. Particularly, the F-scores for the DoS, Reconnaissance and Worms classes are improved by over 10%, when compared with those of the other two methods. Moreover, the mean F-score provided by KPCA-DEGSA-HKELM is 60.37%, CPSO-SVM is 46.49% and Dendron is 48.81%. The KPCA-DEGSA-HKELM approach achieves percentage increases of approximately 13.88% and 11.56% in mean F-score when compared to that of CPSO-SVM and Dendron, respectively. In addition, to indicate the performance of KPCA-DEGSAHKELM more intuitively, the F-scores and mean F-scores of CPSOSVM, Dendron and KPCA-DEGSA-HKELM are shown in Fig. 10. The column bars containing slashes, horizontal lines and pure black are the F-scores obtained by CPSO-SVM, Dendron and KPCA-DEGSA-HKELM, respectively. It is apparent that the F-scores for the nine classes and the overall mean F-score obtained by KPCA-DEGSA-HKELM are higher than those of the other two methods. Table 16 gives the comparison of the overall evaluation metrics for the proposed approach and other literature methods. The

Table 15 The detailed F-score of each class obtained by KPCA-DEGSA-HKELM and the other literature methods with the UNSW-NB15 dataset. Class

CPSO-SVM [18] (%)

Dendron [6] (%)

Current work (%)

Normal Generic Exploits Fuzzers DoS Reconnaissance Analysis Backdoors Shellcode Worms

92.81 87.45 74.21 44.94 20.23 56.15 15.89 12.41 32.99 27.83

95.58 88.96 76.22 68.84 16.76 53.42 31.48 28.01 22.32 6.56

96.02 94.65 83.55 74.80 34.23 81.43 35.25 21.87 37.99 43.90

Mean

46.49

48.81

60.37

accuracy (Acc), mean accuracy (MAcc), mean F-score (MF ) and attack accuracy (AAcc) of the current work are higher than those of other literature methods, while the false attack rate (FAR) of the current work is lower than that of other literature methods. In particular, the accuracy of the proposed KPCA-DEGSA-HKELM approach is 89.01%, which represents a 3.45% increase over the suboptimal result 85.56% achieved by DT [50]. Therefore, the proposed approach has the ability to improve the performance of the UNSW-NB15 problem. In addition, to demonstrate the advantage in computational efficiency of current work, the results (accuracy, training and testing time) of CPSO-SVM [18] and the proposed KPCA-DEGSAHKELM with the same training dataset D0 are shown in Table 17. As Table 17 indicates, the accuracy of the current work is 89.01%, which yields a percentage increase of 7.95% over that of the CPSO-SVM method with 81.06%. Furthermore, the training and testing times of CPSO-SVM are 114.226274 s and 14.430140 s, while those of KPCA-DEGSA-HKELM are 43.306235 s and 2.567050 s, respectively. Specifically, the KPCA-DEGSA-HKELM achieves savings of 62.09% in training time and 82.21% in testing time,

Please cite this article as: L. Lv, W. Wang, Z. Zhang et al., A novel intrusion detection system based on an optimal hybrid kernel extreme learning machine, Knowledge-Based Systems (2020) 105648, https://doi.org/10.1016/j.knosys.2020.105648.

14

L. Lv, W. Wang, Z. Zhang et al. / Knowledge-Based Systems xxx (xxxx) xxx

Fig. 11. Flowchart of the TE process. Table 16 Comparison of overall evaluation metrics for the proposed approach and other literature methods with the UNSW-NB15 dataset. Method CPSO-SVM [18] ANN [50] NB [50] DT [50] GALR-DT [51] MP [6] C4.5 [6] Dendron [6] CAI [23] Current work

Evaluation metric Acc (%)

MAcc (%)

MF (%)

AAcc (%)

FAR (%)

81.06 81.34 82.07 85.56 81.42 73.89 85.15 84.33 82.74 89.01

49.98 N/A N/A N/A N/A 27.91 49.33 52.21 N/A 59.65

46.49 N/A N/A N/A N/A 26.61 48.79 48.81 N/A 60.37

44.99 N/A N/A N/A N/A 20.57 44.14 47.19 N/A 55.43

5.16 21.13 18.56 15.78 6.39 4.56 2.54 2.61 36.46 2.41

Table 17 The results of the CPSO-SVM method and current work with the testing dataset D1 . Method

Acc (%)

Training time (s)

Testing time (s)

CPSO-SVM [18] Current work Accuracy improved/time saved

81.06 89.01 7.95%

114.226274 43.306235 62.09%

14.430140 2.567050 82.21%

The results are obtained on the same computational platform.

when compared with those of CPSO-SVM. The results of Table 17 indicate the time-saving benefits of KPCA-DEGSA-HKELM for the UNSW-NB15 dataset.

5.3. Case 3: Intrusion detection with the industrial TE process To further demonstrate the ID performance of the proposed KPCA-DEGSA-HKELM approach in a real and complex environment, an industrial simulation experiment platform with a nonlinear and complex multicomponent TE process is built. This platform simulates the continuous chemical system and attack activities in the real TE process to obtain the TE intrusion data.

Table 18 Attack categories of the TE intrusion dataset. Attack

Target

Step Random variation Slow drift Sticking

C header pressure loss—reduced availability (stream 4) A, B, and C feed compositions (stream 4) Reaction kinetics Reactor cooling water valve

5.3.1. Intrusion simulation experiment of the TE process The revised TE process model was created by Bathelt et al. [52] which provided a real chemical simulation platform. Five major units named reactor, product condenser, vapor–liquid separator, product stripper and recycle compressor constitute the TE process [53]. The revised TE process contains four reactants named A, C, D and E as well as an inertia component, B. The setup produces two liquid products named G and H and a byproduct named F through a reaction system composed of four irreversible chemical reactions, and the flowchart of the TE process is shown in Fig. 11 [26,52]. The intrusion experiment of the revised TE process was conducted in operating mode 1, which seems to be the most commonly used mode in the literature [30]. The process was first run for 40 h under normal operating conditions (Phase I), and an attack (including step, random variation, slow drift and sticking attack categories) was then introduced to the process for 100 h (Phase II). Taking the slow drift attack as an example, the corresponding plots of the reactor pressure and G product quality are shown in Figs. 12 and 13, respectively. Figs. 12 and 13 illustrate that the process is in control in Phase I; however, when introducing a slow drift attack at 40 h, the process tends to become out of control in Phase II. In the practical industrial process, fluctuations in process variables, such as reactor pressure, will cause a significant reduction in product quality and even severe damage to the equipment. As a result, it is significant to propose an effective ID system for detecting malicious activities in industrial processes. Finally, the values of the 41 measured variables and 12 manipulated variables were collected in sequence during the continuous

Please cite this article as: L. Lv, W. Wang, Z. Zhang et al., A novel intrusion detection system based on an optimal hybrid kernel extreme learning machine, Knowledge-Based Systems (2020) 105648, https://doi.org/10.1016/j.knosys.2020.105648.

L. Lv, W. Wang, Z. Zhang et al. / Knowledge-Based Systems xxx (xxxx) xxx

Fig. 12. Plot of the reactor pressure during both Phase I and Phase II.

15

(AAcc) evaluation metrics of the proposed KPCA-DEGSA-HKELM are 95.82%, 95.90% and 95.16%, respectively, which are higher than those of CPSO-SVM; while the false attack rate (FAR) of the current work is lower than that of CPSO-SVM. As a result, the proposed approach presents the ability to improve the performance of the TE intrusion problem. In addition, to demonstrate the advantage in computational efficiency of the current work, the results (accuracy, training and testing time) of CPSO-SVM [18] and the proposed KPCADEGSA-HKELM with the same training dataset E0 are shown in Table 21. As Table 21 shows, the accuracy of the current work is 95.82%, which achieves a percentage increase of 5.11% when compared to that of the CPSO-SVM method with 90.71%. Furthermore, the training and testing times of CPSO-SVM are 54.184571 s and 0.252235 s, while those of KPCA-DEGSA-HKELM are 21.283896 s and 0.128424 s, respectively. In detail, the KPCA-DEGSA-HKELM achieves savings of 60.72% in training time and 49.09% in testing time, when compared to those of CPSO-SVM. The results in Table 21 demonstrate the time-saving benefits of KPCA- DEGSAHKELM for the TE intrusion dataset.

6. Conclusions

Fig. 13. Plot of G production quality during both Phase I and Phase II. Table 19 The instances of training and testing datasets for the TE intrusion dataset. Class

Number of instances Training dataset E0

Normal Step Random variation Slow drift Sticking Total

Testing dataset E1

300 300 300 300 300

10000 10000 10000 10000 10000

1500

50000

operation of the process with a sampling time of 0.01 h, and those 52 variables are used as feature variables in the TE intrusion dataset. Table 18 describes the major four attack categories of the TE intrusion dataset. Table 19 provides the detailed information of the instances in the datasets, where the training and testing datasets are denoted as E0 and E1 , respectively. 5.3.2. Experimental results and discussion The proposed KPCA-DEGSA-HKELM approach and the CPSOSVM method are tested with training dataset E0 and testing dataset E1 . Table 20 presents the comparison of the solutions between the proposed approach and the CPSO-SVM method. As Table 20 shows, the current work exhibits F-scores for the five classes that are superior to those of CPSO-SVM. Moreover, the accuracy (Acc), mean F-score (MF ) and attack accuracy

An effective IDS named KPCA-DEGSA-HKELM is proposed that can detect malicious attacks successfully. For the classic benchmark KDD99 dataset, the HKELM model, based on the presented hybrid kernel function, outperforms Poly_KELM with the polynomial kernel and RBF_KELM with the RBF kernel, as gauged by all the overall evaluation metrics. The hybrid optimization algorithm DEGSA, which combines the advantages of both GSA and the DE algorithm, is employed to search for the optimal parameters of the HKELM model. To show the performance of DEGSA-HKELM, DE-HKELM and GSA-HKELM are developed and estimated. The three models are applied to the training dataset T0 and testing dataset T1 , and the results indicate that DEGSA-HKELM achieves a percentage increase in the mean F-score of approximately 2.09% and 1.25% compared to that of DE-HKELM and GSA-HKELM, respectively. In addition, the KPCA algorithm is introduced for the dimensionality reduction and feature extraction of the ID data. Then, the KPCA-DEGSA-HKELM approach is carried out with the training dataset T0 and testing dataset T2 . Compared with other literature results, the mean F-score of the current work is improved by approximately 28.34%, 21.01%, 15.93% and 1.44% when compared to that of KDDwinner, CSVAC, CPSO-SVM and Dendron, respectively. For the real modern UNSW-NB15 dataset, all the overall evaluation metrics of the current work are higher than those of other literature methods, while the false attack rate (FAR) of the current work is lower than that of other literature methods. In particular, the accuracy of the proposed KPCA-DEGSAHKELM approach achieves a 3.45% percentage increase over the suboptimal result of 85.56% obtained by DT. For the industrial intrusion TE dataset, the accuracy of the current work achieves a percentage increase of 5.11%, when compared to that of the CPSO-SVM method. Furthermore, KPCA-DEGSA-HKELM achieves a higher computational efficiency with savings of 60.57%, 82.21% and 49.09% in testing time compared to that of CPSO-SVM for the KDD99, UNSW-NB15 and intrusion TE datasets, respectively. The experimental results for the three ID datasets demonstrate the effectiveness and efficiency benefits of the current work.

Please cite this article as: L. Lv, W. Wang, Z. Zhang et al., A novel intrusion detection system based on an optimal hybrid kernel extreme learning machine, Knowledge-Based Systems (2020) 105648, https://doi.org/10.1016/j.knosys.2020.105648.

16

L. Lv, W. Wang, Z. Zhang et al. / Knowledge-Based Systems xxx (xxxx) xxx Table 20 Comparison of the solutions between the proposed approach and CPSO-SVM with the TE intrusion dataset. F-score (%)

Method CPSO-SVM [18] Current work

Normal

Step

Random

Slow

Sticking

88.57 91.28

95.99 99.09

90.19 95.54

87.40 97.41

91.98 96.21

Acc (%)

MF (%)

AAcc (%)

FAR (%)

90.71 95.82

90.83 95.90

90.78 95.16

9.58 1.53

Random: Random variation; Slow: Slow drift. Table 21 Results of the CPSO-SVM method and current work with the testing dataset E1 . Method

Acc (%)

Training time (s)

Testing time (s)

CPSO-SVM [18] Current work Accuracy improved/time saved

90.71 95.82 5.11%

54.184571 21.283896 60.72%

0.252235 0.128424 49.09%

The results are obtained on the same computational platform.

Notation a b B C fit G H I K M p tar T T0 T1 , T2 U V

w x y

α β γ Γ Φ

Exponent parameter of the RBF kernel function Constant parameter of the polynomial kernel function Euclidian distance Penalty parameter Fitness value Gravitational constant Output matrix of the hidden layer Identity matrix Kernel function Gravitational mass Exponent parameter of the polynomial kernel function Target vector Output matrix of the target Training dataset Testing datasets Trail vector Mutant vector Weight coefficient of the hybrid kernel function Input feature vector Output vector Weight vector between the input layer and hidden layer Weight matrix of the hidden layer Descending coefficient High-dimensional feature space Nonlinear mapping function

CRediT authorship contribution statement Lu Lv: Methodology, Software, Writing - original draft. Wenhai Wang: Supervision, Project administration. Zeyin Zhang: Supervision, Project administration. Xinggao Liu: Supervision, Writing - review & editing, Funding acquisition. Acknowledgments This work is supported by the National Key R&D Program of China (grant number 2018YFB2004200), Zhejiang Provincial Natural Science Foundation, PR China (grant number LY18D060002), and National Natural Science Foundation of China (grant number 61590921), and their supports are thereby acknowledged. References [1] H.W. Wang, J. Gu, S.S. Wang, An effective intrusion detection framework based on SVM with feature augmentation, Knowl.-Based Syst. 136 (2017) 130–139, http://dx.doi.org/10.1016/j.knosys.2017.09.014.

[2] A.L. Buczak, E. Guven, A survey of data mining and machine learning methods for cyber security intrusion detection, IEEE Commun. Surv. Tutor. 18 (2016) 1153–1176, http://dx.doi.org/10.1109/comst.2015.2494502. [3] R. Yahalom, A. Steren, Y. Nameri, M. Roytman, A. Porgador, Y. Elovici, Improving the effectiveness of intrusion detection systems for hierarchical data, Knowl.-Based Syst. 168 (2019) 59–69, http://dx.doi.org/10.1016/j. knosys.2019.01.002. [4] T. Aldwairi, D. Perera, M.A. Novotny, An evaluation of the performance of restricted Boltzmann machines as a model for anomaly network intrusion detection, Comput. Netw. 144 (2018) 111–119, http://dx.doi.org/10.1016/j. comnet.2018.07.025. [5] W.Y. Feng, Q.L. Zhang, G.Z. Hu, J.X.J. Huang, Mining network data for intrusion detection through combining SVMs with ant colony networks, Future Gener. Comput. Syst. 37 (2014) 127–140, http://dx.doi.org/10.1016/ j.future.2013.06.027. [6] D. Papamartzivanos, F.G. Marmol, G. Kambourakis, Dendron: Genetic trees driven rule induction for network intrusion detection systems, Future Gener. Comput. Syst. 79 (2018) 558–574, http://dx.doi.org/10.1016/j.future. 2017.09.056. [7] C.H. Tsang, S. Kwong, H.L. Wang, Genetic-fuzzy rule mining approach and evaluation of feature selection techniques for anomaly intrusion detection, Pattern Recognit. 40 (2007) 2373–2391, http://dx.doi.org/10.1016/j.patcog. 2006.12.009. [8] F. Salo, A.B. Nassif, A. Essex, Dimensionality reduction with IG-PCA and ensemble classifier for network intrusion detection, Comput. Netw. 148 (2019) 164–175, http://dx.doi.org/10.1016/j.comnet.2018.11.010. [9] C.F. Tsai, C.Y. Lin, A triangle area based nearest neighbors approach to intrusion detection, Pattern Recognit. 43 (2010) 222–229, http://dx.doi.org/ 10.1016/j.patcog.2009.05.017. [10] Y. Li, L. Guo, An active learning based TCM-KNN algorithm for supervised network intrusion detection, Comput. Secur. 26 (2007) 459–467, http: //dx.doi.org/10.1016/j.cose.2007.10.002. [11] C. Xiang, P.C. Yong, L.S. Meng, Design of multiple-level hybrid classifier for intrusion detection system using Bayesian clustering and decision trees, Pattern Recognit. Lett. 29 (2008) 918–924, http://dx.doi.org/10.1016/ j.patrec.2008.01.008. [12] G.Y. Chan, C.S. Lee, S.H. Heng, Policy-enhanced ANFIS model to counter SOAP-related attacks, Knowl.-Based Syst. 35 (2012) 64–76, http://dx.doi. org/10.1016/j.knosys.2012.04.013. [13] G.Y. Chan, C.S. Lee, S.H. Heng, Discovering fuzzy association rule patterns and increasing sensitivity analysis of XML-related attacks, J. Netw. Comput. Appl. 36 (2013) 829–842, http://dx.doi.org/10.1016/j.jnca.2012.11.006. [14] G.Y. Chan, C.S. Lee, S.H. Heng, Defending against XML-related attacks in ecommerce applications with predictive fuzzy associative rules, Appl. Soft. Comput. 24 (2014) 142–157, http://dx.doi.org/10.1016/j.asoc.2014.06.053. [15] W.L. Al-Yaseen, Z.A. Othman, M.Z.A. Nazri, Real-time multi-agent system for an adaptive intrusion detection system, Pattern Recognit. Lett. 85 (2017) 56–64, http://dx.doi.org/10.1016/j.patrec.2016.11.018. [16] P.Y. Tao, Z. Sun, Z.X. Sun, An improved intrusion detection algorithm based on GA and SVM, IEEE Access 6 (2018) 13624–13631, http://dx.doi.org/10. 1109/access.2018.2810198. [17] J.P. Liu, J.Z. He, W.X. Zhang, T.Y. Ma, Z.H. Tang, J.P. Niyoyita, W.H. Gui, ANID-SEoKELM: Adaptive network intrusion detection based on selective ensemble of kernel ELMs with random features, Knowl.-Based Syst. 177 (2019) 104–116, http://dx.doi.org/10.1016/j.knosys.2019.04.008. [18] F.J. Kuang, S.Y. Zhang, Z. Jin, W.H. Xu, A novel SVM by combining kernel principal component analysis and improved chaotic particle swarm optimization for intrusion detection, Soft Comput. 19 (2015) 1187–1199, http://dx.doi.org/10.1007/s00500-014-1332-7.

Please cite this article as: L. Lv, W. Wang, Z. Zhang et al., A novel intrusion detection system based on an optimal hybrid kernel extreme learning machine, Knowledge-Based Systems (2020) 105648, https://doi.org/10.1016/j.knosys.2020.105648.

L. Lv, W. Wang, Z. Zhang et al. / Knowledge-Based Systems xxx (xxxx) xxx [19] A.I. Saleh, F.M. Talaat, L.M. Labib, A hybrid intrusion detection system (HIDS) based on prioritized k-nearest neighbors and optimized SVM classifiers, Artif. Intell. Rev. 51 (2019) 403–443, http://dx.doi.org/10.1007/ s10462-017-9567-1. [20] M.R.G. Raman, N. Somu, K. Kirthivasan, R. Liscano, V.S.S. Sriram, An efficient intrusion detection system based on hypergraph - Genetic algorithm for parameter optimization and feature selection in support vector machine, Knowl.-Based Syst. 134 (2017) 1–12, http://dx.doi.org/10.1016/j.knosys. 2017.07.005. [21] A.A. Aburomman, M.B.I. Reaz, A novel SVM-kNN-PSO ensemble method for intrusion detection system, Appl. Soft. Comput. 38 (2016) 360–372, http://dx.doi.org/10.1016/j.asoc.2015.10.011. [22] A.A. Aburomman, M.B. Reaz, A novel weighted support vector machines multiclass classifier based on differential evolution for intrusion detection systems, Inf. Sci. 414 (2017) 225–246, http://dx.doi.org/10.1016/j.ins.2017. 06.007. [23] C.R. Wang, R.F. Xu, S.J. Lee, C.H. Lee, Network intrusion detection using equality constrained-optimization-based extreme learning machines, Knowl.-Based Syst. 147 (2018) 68–80, http://dx.doi.org/10.1016/j.knosys. 2018.02.015. [24] J.H. Ku, B. Zheng, Intrusion detection based on self-adaptive differential evolution extreme learning machine with Gaussian kernel, in: G. Chen, H. Shen, M. Chen (Eds.), Parallel Archit. Algorithm Program, Paap 2017, 2017, pp. 13–24, http://dx.doi.org/10.1007/978-981-10-6442-5_2. [25] H. Bostani, M. Sheikhan, Hybrid of binary gravitational search algorithm and mutual information for feature selection in intrusion detection systems, Soft Comput. 21 (2017) 2307–2324, http://dx.doi.org/10.1007/ s00500-015-1942-8. [26] S.M. He, L. Xiao, Y.L. Wang, X.G. Liu, C.H. Yang, J.G. Lu, W.H. Gui, Y.X. Sun, A novel fault diagnosis method based on optimal relevance vector machine, Neurocomputing 267 (2017) 651–663, http://dx.doi.org/10.1016/j.neucom. 2017.06.024. [27] X. Qiu, K.C. Tan, J.X. Xu, Multiple exponential recombination for differential evolution, IEEE Trans. Cybern. 47 (2017) 995–1006, http://dx.doi.org/10. 1109/tcyb.2016.2536167. [28] W. Lee, S.J. Stolfo, A framework for constructing features and models for intrusion detection systems, ACM Trans. Inf. Syst. Secur. 3 (2000) 227–261, http://dx.doi.org/10.1145/382912.382914. [29] N. Moustafa, J. Slay, UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set), in: IEEE Military Communications and Information Systems Conference, MilCIS, 2015, http://dx.doi.org/10.1109/MilCIS.2015.7348942. [30] F. Capaci, E. Vanhatalo, M. Kulahci, The revised Tennessee Eastman process simulator as testbed for SPC and DoE methods, Qual. Eng. 31 (2019) 212–229, http://dx.doi.org/10.1080/08982112.2018.1461905. [31] G.B. Huang, Q.Y. Zhu, C.K. Siew, Extreme learning machine: A new learning scheme of feedforward neural networks, in: 2004 IEEE Int. Jt. Conf. Neural Networks, Vols. 1–4, Proc, 2004, pp. 985–990, http://dx.doi.org/10.1109/ IJCNN.2004.1380068. [32] G.B. Huang, Q.Y. Zhu, C.K. Siew, Extreme learning machine: Theory and applications, Neurocomputing 70 (2006) 489–501, http://dx.doi.org/10. 1016/j.neucom.2005.12.126. [33] J. Wu, Y. Zhu, Z.C. Wang, Z.J. Song, X.G. Liu, W.H. Wang, Z.Y. Zhang, Y.S. Yu, Z.P. Xu, T.J. Zhang, J.H. Zhou, A novel ship classification approach for high resolution SAR images based on the BDA-KELM classification model, Int. J. Remote Sens. 38 (2017) 6457–6476, http://dx.doi.org/10.1080/01431161. 2017.1356487. [34] G.B. Huang, H.M. Zhou, X.J. Ding, R. Zhang, Extreme learning machine for regression and multiclass classification, IEEE Trans. Syst. Man Cybern. B 42 (2012) 513–529, http://dx.doi.org/10.1109/tsmcb.2011.2168604.

17

[35] G.F. Smits, E.M. Jordaan, Improved SVM regression using mixtures of kernels, in: Proceeding 2002 Int. Jt. Conf. Neural Networks, Vols. 1–3, 2002, pp. 2785–2790, http://dx.doi.org/10.1109/IJCNN.2002.1007589. [36] Z.D. Tian, S.J. Li, Y.H. Wang, X.D. Wang, Wind power prediction method based on hybrid kernel function support vector machine, Wind Eng. 42 (2018) 252–264, http://dx.doi.org/10.1177/0309524x17737337. [37] R. Storn, K. Price, Differential evolution - A simple and efficient heuristic for global optimization over continuous spaces, J. Global Optim. 11 (1997) 341–359, http://dx.doi.org/10.1023/a:1008202821328. [38] S. Das, P.N. Suganthan, Differential evolution: A survey of the state-of-theart, IEEE Trans. Evol. Comput. 15 (2011) 4–31, http://dx.doi.org/10.1109/ tevc.2010.2059031. [39] A.K. Qin, V.L. Huang, P.N. Suganthan, Differential evolution algorithm with strategy adaptation for global numerical optimization, IEEE Trans. Evol. Comput. 13 (2009) 398–417, http://dx.doi.org/10.1109/tevc.2008.927706. [40] E. Rashedi, H. Nezamabadi-Pour, S. Saryazdi, GSA: A gravitational search algorithm, Inform. Sci. 179 (2009) 2232–2248, http://dx.doi.org/10.1016/j. ins.2009.03.004. [41] E. Rashedi, H. Nezamabadi-pour, S. Saryazdi, BGSA: binary gravitational search algorithm, Nat. Comput. 9 (2010) 727–745, http://dx.doi.org/10. 1007/s11047-009-9175-3. [42] M. Zhang, X.G. Liu, Z.Y. Zhang, A soft sensor for industrial melt index prediction based on evolutionary extreme learning machine, Chin. J. Chem. Eng. 24 (2016) 1013–1019, http://dx.doi.org/10.1016/j.cjche.2016.05.030. [43] M. Seyedmahmoudian, R. Rahmani, S. Mekhilef, A.M.T. Oo, A. Stojcevski, T.K. Soon, A.S. Ghandhari, Simulation and hardware implementation of new maximum power point tracking technique for partially shaded PV system using hybrid DEPSO method, IEEE Trans. Sustain. Energy 6 (2015) 850–862, http://dx.doi.org/10.1109/tste.2015.2413359. [44] W.J. Zhang, X.F. Xie, DEPSO: Hybrid particle swarm with differential evolution operator, in: 2003 IEEE Int. Conf. Syst. Man Cybern. Vols 1– 5, Conf. Proc, 2003, pp. 3816–3821, http://dx.doi.org/10.1109/ICSMC.2003. 1244483. [45] I.T. Jolliffe, Principle Component Analysis, 2006. [46] B. Scholkopf, A. Smola, K.R. Muller, Nonlinear component analysis as a kernel eigenvalue problem, Neural Comput. 10 (1998) 1299–1319, http: //dx.doi.org/10.1162/089976698300017467. [47] L.L. Guo, P. Wu, J.F. Gao, S.W. Lou, Sparse kernel principal component analysis via sequential approach for nonlinear process monitoring, IEEE Access 7 (2019) 47550–47563, http://dx.doi.org/10.1109/access.2019.2909986. [48] C. Elkan, Results of the KDD’99 classifier learning, ACM SIGKDD Explor. Newsl. 1 (2000) 63–64, http://dx.doi.org/10.1145/846183.846199. [49] S.J. Horng, M.Y. Su, Y.H. Chen, T.W. Kao, R.J. Chen, J.L. Lai, C.D. Perkasa, A novel intrusion detection system based on hierarchical clustering and support vector machines, Expert Syst. Appl. 38 (2011) 306–313, http: //dx.doi.org/10.1016/j.eswa.2010.06.066. [50] N. Moustafa, J. Slay, The evaluation of network anomaly detection systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set, Int. J. Inf. Secur. 25 (2016) 18–31, http://dx.doi.org/ 10.1080/19393555.2015.1125974. [51] C. Khammassi, S. Krichen, A GA-LR wrapper approach for feature selection in network intrusion detection, Comput. Secur. 70 (2017) 255–277, http: //dx.doi.org/10.1016/j.cose.2017.06.005. [52] A. Bathelt, N.L. Ricker, M. Jelali, Revision of the Tennessee Eastman process model, in: 2015 IFAC Symposium on Advanced Control of Chemical Processes ADCHEM, 48, 2015, pp. 309–314, http://dx.doi.org/10.1016/j. ifacol.2015.08.199.

Please cite this article as: L. Lv, W. Wang, Z. Zhang et al., A novel intrusion detection system based on an optimal hybrid kernel extreme learning machine, Knowledge-Based Systems (2020) 105648, https://doi.org/10.1016/j.knosys.2020.105648.