Chaos, Solitons and Fractals 110 (2018) 33–40
Contents lists available at ScienceDirect
Chaos, Solitons and Fractals Nonlinear Science, and Nonequilibrium and Complex Phenomena journal homepage: www.elsevier.com/locate/chaos
Nearest neighbors based density peaks approach to intrusion detection Lixiang Li a,b,c,∗, Hao Zhang b,c, Haipeng Peng b,c, Yixian Yang b,c,d a
School of Computer Science and Technology, Henan Polytechnic University, 2001 Century Avenue, Jiaozuo, Henan, 454003, China Information Security Center, State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, 100876, China c National Engineering Laboratory for Disaster Backup and Recovery, Beijing University of Posts and Telecommunications, Beijing, 100876, China d State key Laboratory of Public Big Data, Guizhou, 550025, China b
a r t i c l e
i n f o
Article history: Received 28 November 2017 Accepted 5 March 2018
Keywords: Intrusion detection KDD-CUP 99 k-nearest neighbors (kNN) Density peaks clustering (DPC) Density peaks nearest neighbor (DPNN)
a b s t r a c t Intrusion detection systems are very important for network security. However, traditional intrusion detection systems can not identify new type of network intrusion for example zero-day attack. Many machine learning techniques were used in intrusion detection system and they showed better detection performance than other methods. A novel clustering algorithm called Density peaks clustering (DPC) which does not need many parameters and its iterative process is based on density. Because of its simple steps and parameters, it may have many application fields. So we are going to use it in intrusion detection to find a more accurate and efficient classifier. On the basis of some good ideas of DPC, this paper proposes a hybrid learning model based on k-nearest neighbors (kNN) in order to detect attacks more effectively and introduce the density in kNN. In density peaks nearest neighbors (DPNN), KDD-CUP 99 which is the standard dataset in intrusion detection is used to the experiment. Then, we use the dataset to train and calculate some parameters which are used in this algorithm. Finally, the DPNN classifier is used to classify attacks. Experiment results suggest that the DPNN performs better than support vector machine (SVM), k-nearest neighbors (kNN) and many other machine learning methods, and it can effectively detect intrusion attacks and has a good performance in accuracy. © 2018 Elsevier Ltd. All rights reserved.
1. Introduction With the rapid development of computer network technology and the dramatic increment of computer users, the information security becomes more important on computer network. In order to protect the security of computer and Internet, traditional security defense measures, such as identity authentication and access control, have exposed many defects or vulnerabilities [1]. Intrusion detection which is emerged on the information and network security becomes one of the key technologies. It is not affected by time and space limitations, the characteristics of attack strike means hidden and more intricate, internal commit crime continually. Intrusion detection becomes a new solution for network security strategy because of its new character [2]. James Aderso firstly introduced intrusion detection in 1980, which was the beginning of the intrusion detection research. Since then, many methods were used in intrusion detection, such as
∗ Corresponding author at: Information Security Center, State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, 100876, China. E-mail address:
[email protected] (L. Li).
https://doi.org/10.1016/j.chaos.2018.03.010 0960-0779/© 2018 Elsevier Ltd. All rights reserved.
intelligent algorithm [3–5] and clustering [6–8]. In the past few years, many machine learning methods were used in intrusion detection [9–11]. Chih-Fong Tsai came up with the triangle area based nearest neighbors approach (TANN) [12]. In TANN, the kmeans clustering was firstly used to obtain cluster centers corresponding to the attack classes respectively. It can detect attacks more effectively. Wei-Chao Lin introduced an intrusion detection system based on combining cluster centers and nearest neighbors[13]. CANN classifier was not only similar to but also performed better than kNN and supported vector machines which were trained and tested by the original feature representation in terms of classification accuracy. Abdulla Amin Aburomman gave a new idea that used PSO to generate weights to create ensemble of classifiers [14]. There are many algorithms based on SVM or deep learning which are applied to intrusion detection [15–18]. These methods demonstrate the advantages of machine learning in intrusion detection. Although intrusion detection is constantly developing, it still exists many problems and challenges [19]. Firstly, the KDD-CUP 99 is the standard data set in the field of intrusion detection, it has a total of 41-dimensional feature [20,21]. Such a high dimension has an enormous influence on efficiency, so the dimension must
34
L. Li et al. / Chaos, Solitons and Fractals 110 (2018) 33–40
be reduced [22–24]. Secondly, many methods have been proposed to improve the accuracy of Dos that is one of the most common attacks in intrusion detection. Although Probe and R2L often appear in the attack, there is few work considering them. To solve these two problems, many hybrid methods appeared [25]. kNN is one of the most basic methods of machine learning which is based on distance. It has many advantages, such as ease of implementation, suitable for multi-class classification and less estimate parameters [26]. But kNN has a bad performance when the data is unbalanced. The peak density clustering is an algorithm which can efficiently solve the problem of many kinds of clustering [27]. Similar to k-means clustering methods [28], the algorithm is only based on the distance between the two classes. Meanwhile the same as DBSCAN [29] and mean shift clustering algorithm, the peak density clustering can not only classify the non-spherical accurate data sets into different clusters, but also can automatically find the number of clusters [30]. But it is different from mean shift algorithm, the peak density algorithm does not need to embed data into vector space, or increase the density of each class [31]. As far as we know, these two methods are effective. But there is few attention on mixing DPC and kNN together. In this paper, we propose a novel method DPNN using the idea of density [27,32] and the process of the k-nearest neighbors to detect attacks [33]. First of all, the data is preprocessed and the features are selected by SVM. Then the density is introduced into kNN as a standard. And it is used to filter the data with distance. Due to the density, kNN has stronger robustness to some specific data. And it can solve the problem which is hard to choose some parameters of DPC. The experiment results prove that this method can greatly improve the accuracy for intrusion detection. And DPNN can reduce testing time by screening some insignificant data. The time of algorithm mainly comes from the parameters calculation in training set, practical application when we have already trained the training set. DPNN spends little time on classifying the abnormal data. So DPNN has a very strong practicability. The rest of this paper is organized as follows: Section 2 briefly describes two classical methods used in this paper. Section 3 introduces the feature selection and DPNN. Section 4 gives many groups of comparing experiment results. Conclusion and future work are provided in Section 5.
Table 1 Dataset size of KDD-CUP 99.
Testing data Training data
Normal
Probe
Dos
U2R
R2L
60,593 97,277
4166 4107
231,455 391,458
88 52
14,727 1126
ter center is surrounded by a lower local density of the adjacent points, and the adjacent points distance density is higher than the other distances which are relatively far from the center of the clusters [40]. Two quantities need to be calculated for each object in density peak clustering: the local density of object is higher than the density of points. The size of the amount of ρ i and ϕ (x) are only related to the distance between two objects. We defined the local density of the object as formula:
ρi =
ϕ (di j − dc ),
(1)
j
ϕ (x ) =
1 0
x < 0, x > 0.
(2)
dc is to specify a threshold, ϕ (x) is defined as a formula. Generally speaking, ρ i is equal to the distance from the object i which is smaller than dc by the number of all the classes. Through the analysis of formula, the peak density algorithm in large data sets is only sensitive to dc . Then it calculates the shortest distance and the high density of the class.
δi =
min(di j ) j:ρ j >ρi
max(di j )
i is not density maximum points, (3) i is density maximum points.
For the highest density of points, we usually use δi = max(di j ) to represent the density of δ i which is greater than the adjacent local or global density of maximum density. So the point of δ i is regarded as a cluster center.
3. Preparatory work and rule evaluation
2. Basic method
3.1. Dataset description
Although data mining has developed for many years, it still has many areas that can be consummated [30,34]. We proposed a method DPNN combining the core ideas of kNN and DPC [27]. This section provides brief reviews of kNN and DPC.
In this paper, we used kDD-Cup 99 data set which was proposed in 1998 by MIT Lincoln Laboratory. Since it was put forward, it has been regarded as the standard data set for intrusion detection up to now. KDD-CUP 99 data set contains 494,020 samples. Each sample which represents a network connection is composed by 41-dimensional feature vector. Each sample of kDD-Cup 99 data set is labeled as normal or abnormal. Abnormal is subdivided into four major categories, a total of 39 types of attacks:
2.1. kNN classifier The k-nearest neighbor is a kind of typical nonparametric algorithm [35]. The judgement standard of kNN is the distance [36]. Compared with other classification algorithms, it is simpler and more efficient [37]. For a given point, kNN records the neighbors of a testing sequence among the training data, and uses the labels of the nearest neighbors to assign the testing sample. The parameter k of kNN is the core of the method. So the variation of k will influence the accuracy which is the standard evaluation criteria in machine learning. In the contrast experiments, to ensure the diversity of methods and to maximally explore the potential of kNN, we will choose different values of parameter k. 2.2. Density peak clustering Density peak clustering is a kind of clustering algorithm based on density [38,39]. One of the main assumption is that each clus-
•
•
•
•
Probe: A connection attempts to search potential dangerous on target machine. Denial of Service(Dos): A connection attempts to cause the disruption of service or let the target machine crashed. User to Root(U2R): An attacker who has already gained access wants to gain the privileges of super user. Remote to Local(R2L): An attacker attempts to gain illegal access to a remote computer.
We choose five data sets which are taken from testing and training kDD-CUP 99 randomly. All data sets detailed written in Table 1 are the same size. And the original data sets of kDD-CUP 99 are presented in Table 2.
L. Li et al. / Chaos, Solitons and Fractals 110 (2018) 33–40 Table 2 Dataset size used in the experiment.
Testing data Training data
Normal
Probe
Dos
U2R
R2L
10 0 0 10 0 0
400 400
40 0 0 40 0 0
52 52
100 100
3.2. Data pre-processing Before starting our work, the machine learning algorithm should be executed efficiently and accurately. So we need to pretreat the data set first. Firstly, each sample of KDD-CUP 99 is 41-dimension. The highdimensional data like this needs a lot of time to deal with. So we should reduce its dimension. On the basis of the works of pioneer contributors, we compared different features and acquired the importance of different features. Through the importance, we choose fifteen features by SVM to make up a new data set used in the experiment [41]. We show these fifteen features in detail in Table 3. Before using the classification method, we need to ensure that all the samples are made up of numbers. In the fifteen features we use, there are three features for us to process. •
•
•
protocol_type: There are a total of three values of this feature, they are TCP, UDP and ICMP. Respectively, we use numbers 1–3 to represent them. flag: We use numbers 1–12 to represent the possible statuses SF, S0, S1, S2, S3, OTH, REJ, RSTO, RSTOSO, SH, RSTRH and SHR. label: In order to get the result, the Normal class is mapped to the number 1. Likewise the Probe, Dos, U2R, R2L are mapped to the numbers 2–5 respectively.
35
condition that the dimension is very low. And the selection of center points is not automatic, we need to make a subjective choice which may affect the accuracy of the algorithm. So DPC is not suitable for the field of intrusion detection. However, we can learn some good ideas from it to improve kNN which has a high accuracy in intrusion detection. Aiming at the existing problem of kNN, we need to do some improvements which can make kNN more accurately, practically and solve the problem of more types of data. In kNN we usually use the Euclidean distance to represent the distance between two points. In the view of the unbalanced data, for example, the less a class of data is , the poorer the performance of kNN is. So, in order to solve this problem, we will introduce the density of DPC into kNN, and give a new kNN method based on DPC(DPNN). Unlike kNN method only using Euclidean distance to absolute measured data by the class, in DPNN the density is also an important part of the standard. And the density equation of DPC does not apply to DPNN method, so Rodriguez and Laio presented another density formula [27]:
ρi =
exp −
j
d ( xi , x j )
2
dc
2
,
(5)
where dc is an adjustable parameter, which represents a cutoff distance. In our algorithm, the determination of dc is very important. The process of selecting dc is actually the procedure of selecting the average number of neighbors of all points in data set. We define dc as: r , dc = dN× 100
(6)
3.3. Rule evaluation
where N represents the total number of data set and r is a percentage. dN∗ r ∈ D = [d1 , d2 , . . . , dN ], D is all the distances in data
The performance of intrusion detection is measured using a confusion matrix which is shown in Table 4.
set, which has already been raised to be sorted.
100
•
•
•
•
True positives(TP) denotes the normal behavior that is predicted as correct. True negatives(TN) indicates the abnormal behavior that is correctly predicted. False positives(FP) specifies the abnormal behavior which is predicted as normal performance. False negative(FN) represents the normal behavior which is wrongly predicted.
In the process of classifier design, the evaluation of the classifier is important. A good evaluation index is better for us to optimize a classification model. In previous works, there are several different evaluation metrics that can evaluate the classification accuracy such as precision, recall, sensitivity etc. The rule evaluation will decide the veracity of the classification accuracy. In general, we define the standard classification accuracy as:
TP + TN Classification accuracy rate = . TP + TN + FP + FN
(4)
Therefore in classification problems, the accuracy of every classification is in the range [0, 1]. Thus for all the experimental results, we use the accuracy to evaluate performance. 4. Methodology 4.1. The proposed algorithm Although DPC is a good algorithm, it has some limitations. As a clustering algorithm, it has a good performance only under the
4.2. The steps of algorithm DPNN is carried out through the following three steps: • • •
Calculate the parameters of each point of training set. Classify the sample point and assign it in test set. Repeat the above operation and assign the remaining points.
We have a total of five different training sets and five different test sets like Table 2. In the first step, we calculate all the distances among every two points in training set. Then on the basis of Formula (5), we calculate the density of each point. According to Formula (6), we know that r is an adjustable parameter. We should select r such that the algorithm have the highest accuracy. Thus we need to get a suitable r through out the whole experiment. Algorithm 1 DPNN algorithm. Input: The samples X ∈ RN×M The parameter r Output: The label vector of cluster index: y ∈ RN×1 step1: Calculate distance matrix in training set step2: Calculate density matrix according to Formula (5) in training set step3: Calculate Di and ρi for point i according to Formula (5) step4: Use the core idea of kNN to assign point i by Di and ρi step5: Assign each remaining point according to step 1–4 return y
36
L. Li et al. / Chaos, Solitons and Fractals 110 (2018) 33–40 Table 3 The features used in the experiment. Feature name
Feature number
Description
logged_in count dst_host_count protocol_type srv_count dst_host_same_src_port_rate srv_diff_host_rate dst_bytes serror_rate srv_serror_rate same_srv_rate flag dst_host_same_srv_rate dst_host_srv_count dst_host_srv_diff_host_rate
12 23 32 2 24 36 31 6 25 26 29 4 34 33 37
Login value is correct then assign 1 else 0 Number of connections which have the same host to destination Number of connections to the same IP Protocol type, discrete type and a total of three values include TCP, UDP, ICMP Number of connections which have the same service to destination Percentage of connections to the same source port Percentage of connections which have the same service and different host The number of bytes from source host to destination host Percentage of connections which have the same service that occur SYN mistakes Percentage of connections which have the same host that occur SYN mistakes Percentage of connections that exist for the same service among connections Connection status. Possible status are SF, S0, S1, S2, S3, OTH, REJ, RSTO, RSTOSO, SH, RSTRH, SHR Percentage of connections to the same service Total connections to specific destination port Percentage of connections to various destinations
Table 4 Confusion matrix of the performance measures for intrusion detection. Actual
Predict Normal
Abnormal
Normal Abnormal
TP FP
FN TN
We find all the points in the training set whose density are larger than the sample points and use these points we can get a new set. Next, the sample point can be classified with the idea of kNN. We use the Euclidean distance as the distance function and sort the distances in the ascending order. Then the kth distance corresponds to the nearest neighbor. Through the above operation, we can get the class which the sample point belongs and assign the sample point. Next, similar to the second step, there are many higher density points comparing with the unlabeled points, and we assign the unlabeled points to the class which their nearest higher density neighbor belong to. Through the above three steps, the algorithm is completed. Complexity Analysis: Calculating the similarity matrix is O(N2 ). DPNN also needs O(N2 ) to calculate the Euclidean distances and the local density. In addition, when choosing dc , we need O(NlogN) to sort the distances with quick sort. The time of the assignment process is O(N). The total time complexity of DPNN algorithm is O(N 2 ) + O(N 2 ) + O(N log N ) + O(N ).
Table 5 Classification results of SVM for the five classes. Expert Dataset1 RBF= 0.2 RBF= 0.5 RBF= 1 RBF= 2 RBF= 5 Dataset2 RBF= 0.2 RBF= 0.5 RBF= 1 RBF=2 RBF=5 Dataset3 RBF= 0.2 RBF= 0.5 RBF= 1 RBF= 2 RBF= 5 Dataset4 RBF= 0.2 RBF= 0.5 RBF= 1 RBF= 2 RBF= 5 Dataset5 RBF= 0.2 RBF= 0.5 RBF= 1 RBF= 2 RBF= 5
Normal
Probe
Dos
U2R
R2L
Total
96.40% 96.10% 95.90% 96.30% 96.00%
85.50% 83.25% 81.50% 81.50% 80.00%
99.62% 99.60% 99.67% 99.68% 99.65%
83.45% 76.64% 71.45% 55.39% 43.12%
91.00% 91.00% 93.00% 85.00% 86.00%
97.81% 97.69% 97.58% 97.49% 97.01%
96.40% 97.00% 96.50% 96.00% 95.80%
86.50% 83.75% 85.50% 82.75% 81.05%
99.58% 99.67% 99.65% 99.70% 99.88%
73.13% 60.92% 53.74% 44.45% 35.26%
95.00% 92.00% 84.00% 86.00% 84.00%
98.01% 97.78% 97.66% 97.89% 98.03%
95.90% 95.00% 94.00% 94.20% 94.30%
88.05% 87.05% 86.75% 86.00% 85.75%
99.88% 99.76% 99.77% 99.90% 99.75%
81.75% 65.82% 30.45% 26.08% 23.91%
90.00% 90.00% 88.00% 88.00% 90.00%
98.06% 97.87% 97.88% 98.35% 97.86%
96.90% 96.80% 96.70% 96.70% 96.50%
88.25% 89.00% 89.05% 86.20% 85.55%
99.58% 99.78% 99.90% 99.65% 99.77%
69.91% 58.45% 58.45% 56.31% 38.71%
93.00% 91.00% 88.00% 84.00% 84.00%
97.29% 97.71% 98.09% 97.56% 97.30%
98.10% 97.60% 97.10% 96.80% 96.40%
85.05% 84.50% 84.50% 85.25% 86.75%
99.40% 99.42% 99.45% 99.47% 99.48%
65.55% 57.11% 38.69% 40.58% 29.92%
94.00% 96.00% 91.00% 89.00% 87.00%
97.69% 97.66% 97.54% 97.32% 97.11%
5.1. Support vector machine 5. Experimental results This experiment is conducted with Enthought Canopy 64-bit using python, installed on windows 7 professional 64bit with an Intel Pentium Processor (8M Cache, up to 3.20 GHz), and 8 GB of RAM. Because of the usage of the new machine learning method, We can learn many features through training. DPNN can find the abnormal data even it is the new type. As well as other machine learning methods, DPNN can also detect zero-day attacks through the analysis of the features. In comparison, traditional intrusion detection systems perform poor to detect this kind of intrusion. All of the different testing methods are used in five different data sets. In the following, we first introduce the results of each method. Then we will compare the accuracy and the time to evaluate the performance of the methods.
In SVM, choosing different kernel functions can produce different SVM [42]and the success of used kernel in SVM mostly depends on data. So we try linear kernel function, polynomial kernel function and radial basis kernel function to decide which one is more suitable. Due to the good performance of radial basis kernel function, we choose it to the experiment. In order to improve the accuracy of the experiment, we select different RBF parameters:
Radial basis kernel function : K (x, y ) = cxp
−|x − y|2 . d2
(7)
Table 5 shows the classification results for each of the five classes and five different data sets, respectively. After testing five different RBF values, in general, the best value we find for RBF is 0.2. In the table, we can get that the class of Dos and Normal can be classified accurately. And the low accuracy of U2R is caused by the small size of samples which is only made up of fifty-two items.
L. Li et al. / Chaos, Solitons and Fractals 110 (2018) 33–40
37
Fig. 1. Accuracy of SVM experts.
The following data for comparison is the mean value of these five data sets. Then for convenience to observe, we average the results of five different data sets and draw the histogram in Fig. 1.
5.2. kNN In kNN algorithm, the k represents the number of neighbors in the training set. The selection of k largely determines the accuracy of the algorithm. There is no accurate method to determine k. So we choose five different k to ensure greater diversity: Similar to SVM, Table 6 shows the classification results of kNN for each of the five classes and five different data sets, respectively. When k = 3, kNN performs the best. According to this set of 25 different experiment, we can see that the results of kNN algorithm are almost the same as results with SVM. And SVM seems to be a little better than kNN algorithm. And in order to facilitate the comparison, the histogram is presented in Fig. 2.
5.3. DPNN For our proposed approach, the values of k in kNN were also examined, and we find out that when ak = 3, DPNN performs the best. As a rule of thumb, we can choose r [1,5] in Eq. (6). And through the experiment, we conclude that when r equals to 2, then we can obtain the best results. Table 7 shows the classification results of DPNN for each of the five classes and five different data sets, respectively. In Table 7, we can see the difference between SVM and kNN except the class of Normal and Dos, and the class of Probe also have a very good performance. In contrast, the accuracy of Probe improves almost 20%. The key to promote accuracy is the introduction of the peak density. The greatest advantage of DPNN is the detection of edge points. But the accuracy of U2R is decreased. Because of the introduction of the peak density, the density will make the class of few samples not easy to be classified. And general DPNN performs better than SVM and kNN. Finally, for intuitive analysis, we draw the histogram in Fig. 3.
Table 6 Classification results of kNN for the five classes. Expert Dataset1 k= 3 k= 5 k= 7 k= 9 k= 11 Dataset2 k= 3 k= 5 k= 7 k= 9 k= 11 Dataset3 k= 3 k= 5 k= 7 k= 9 k= 11 Dataset4 k= 3 k= 5 k= 7 k= 9 k= 11 Dataset5 k= 3 k= 5 k= 7 k= 9 k= 11
Normal
Probe
Dos
U2R
R2L
Total
96.30% 96.20% 96.20% 96.40% 95.50%
85.75% 83.75% 81.50% 81.25% 80.00%
99.60% 99.60% 99.65% 99.65% 99.65%
82.69% 71.15% 63.46% 50.00% 26.92%
91.00% 92.00% 88.00% 85.00% 86.00%
97.69% 97.44% 97.17% 97.01% 96.56%
96.20% 96.70% 96.00% 95.60% 95.40%
86.75% 85.25% 84.50% 82.25% 81.75%
99.58% 99.65% 99.65% 99.65% 99.70%
76.92% 59.62% 53.85% 53.85% 38.46%
90.00% 89.00% 89.00% 87.00% 86.00%
97.66% 97.51% 97.28% 97.01% 96.81%
95.30% 94.40% 94.40% 94.20% 94.30%
87.00% 86.75% 85.50% 84.50% 84.50%
99.78% 99.67% 99.67% 99.78% 99.75%
78.85% 63.46% 26.92% 23.08% 23.08%
91.00% 90.00% 89.00% 90.00% 91.00%
97.69% 97.28% 96.83% 96.78% 96.79%
96.80% 96.60% 96.90% 96.80% 96.00%
88.50% 89.25% 88.00% 86.25% 86.25%
99.78% 99.90% 99.88% 99.88% 99.90%
67.31% 59.62% 57.69% 57.69% 34.62%
95.00% 87.00% 84.00% 84.00% 84.00%
98.04% 97.93% 97.80% 97.66% 97.32%
98.10% 97.90% 97.40% 97.10% 96.70%
87.00% 86.50% 86.50% 86.75% 87.50%
99.48% 99.40% 99.45% 99.45% 99.48%
61.54% 50.00% 32.69% 32.69% 26.92%
93.00% 96.00% 90.00% 86.00% 84.00%
97.75% 97.68% 97.35% 97.24% 97.15%
5.4. Further comparisons In addition to the above several kinds of famous classical methods, we give some excellent methods to improve the certainty factor of the experiment. In order to compare more conveniently, we summarize all of the results. We select the optimal parameters, and the results averaged five data sets on the Fig. 4. Through Fig. 4, we know that k-means algorithm has disappointing performance
38
L. Li et al. / Chaos, Solitons and Fractals 110 (2018) 33–40
Fig. 2. Accuracy of kNN experts.
Fig. 3. Accuracy of DPNN experts.
in all aspects. It can not be applied in the field of intrusion detection as a clustering algorithm. And SVM and kNN almost have the same accuracy, they only have poor performance on Probe and U2R. PSO, LUS and WMA [14] based methods have especially superiority on Probe and U2R, but do not perform well enough on Normal. In contrast, DPNN algorithm also has the good performance on the classes of Normal and Dos, further effectively increases the accuracy of Probe. In the multiple sets of experiments, the overall
accuracy of DPNN has almost been in a leading position compared with other methods, such as TANN and CANN [12,13]. At last, we think about time which DPNN spends in the experiment. DPNN needs more time to calculate the density, but because of the density it will spend less time on classification. According to the density, we can remove a lot of points which is eliminated by density. So as a whole, the efficiency of DPNN algorithm has the certain promotion. (Table 8).
L. Li et al. / Chaos, Solitons and Fractals 110 (2018) 33–40
39
Fig. 4. Accuracy of all method experts.
6. Conclusion
Table 7 Classification results of DPNN for the five classes. Expert Dataset1 k= 3 k= 5 k= 7 k= 9 k= 11 Dataset2 k= 3 k= 5 k= 7 k= 9 k= 11 Dataset3 k= 3 k= 5 k= 7 k= 9 k= 11 Dataset4 k= 3 k= 5 k= 7 k= 9 k= 11 Dataset5 k= 3 k= 5 k= 7 k= 9 k= 11
Normal
Probe
Dos
U2R
R2L
Total
96.50% 95.90% 95.80% 96.70% 95.90%
96.50% 95.50% 95.25% 94.00% 94.25%
99.55% 99.60% 99.60% 99.55% 99.58%
55.77% 55.77% 55.77% 46.15% 53.85%
88.00% 86.00% 87.00% 86.00% 87.00%
98.16% 97.98% 97.96% 97.89% 97.87%
96.30% 96.10% 96.20% 96.00% 96.20%
98.50% 98.00% 97.25% 98.00% 98.00%
99.38% 99.35% 99.38% 99.40% 99.40%
88.46% 71.15% 65.38% 63.46% 65.38%
83.00% 82.00% 82.00% 84.00% 85.00%
98.36% 98.09% 98.07% 98.14% 97.41%
95.70% 96.60% 95.30% 95.50% 96.00%
94.50% 95.00% 93.75% 94.00% 94.00%
99.15% 99.20% 99.15% 99.17% 99.20%
36.54% 44.23% 34.62% 32.69% 32.69%
88.00% 89.00% 90.00% 90.00% 90.00%
97.73% 97.30% 97.35% 97.46% 97.77%
94.30% 95.10% 95.10% 94.70% 95.30%
96.00% 96.75% 92.25% 96.75% 97.00%
99.60% 99.62% 99.72% 99.62% 99.65%
55.77% 50.00% 40.38% 53.85% 55.77%
88.00% 81.00% 78.00% 80.00% 81.00%
97.80% 97.41% 97.75% 97.93% 97.02%
96.80% 96.70% 97.10% 98.00% 97.70%
96.75% 97.50% 95.00% 97.25% 97.00%
99.48% 99.60% 99.50% 99.48% 99.52%
50.00% 48.08% 46.15% 26.92% 40.38%
82.00% 84.00% 83.00% 86.00% 82.00%
98.02% 98.16% 97.95% 98.13% 98.14%
Table 8 Classification results of all methods. Expert
Normal
Probe
Dos
U2R
R2L
KNN SVM PSO based LUS based WMA based TANN CANN DPNN
96.30% 96.40% 83.46% 83.50% 65.79% 97.01% 97.04% 96.50%
85.75% 85.50% 96.14% 96.88% 96.22% 94.89% 87.61% 96.50%
99.60% 99.62% 98.84% 98.85% 98.58% 90.94% 99.68% 99.55%
82.69% 83.45% 99.80% 99.81% 99.79% 60.00% 3.85% 55.77%
91.00% 91.00% 84.73% 84.72% 82.90% 80.53% 57.02% 88.00%
The development of the complexity of network attack makes people recognize the necessity of alarm and have a more clear understanding on intrusion detection. Intrusion detection technology has been put forward more than 30 years which has a variety of applications. This paper gives a novel method that combines density peaks clustering and k nearest neighbors for getting a high accuracy on intrusion detection, namely DPNN. Density peaks clustering is mainly used for training, and k nearest neighbors is mainly used for classification. By using the KDD-Cup 99 data set with 15-dimension, DPNN performs better than kNN, SVM and other classifiers in terms of average accuracy, especially in Probe attack. On average we have the accuracy improvement of Probe of almost 15% compared to the accuracy of the best basic method. Although DPNN spends more time on calculating the density in the stage of training, its testing time is greatly reduced since the density is introduced into this method. On the whole, the efficiency is also increased 20.688%. For future work, DPNN cannot effectively detect U2R attack, which means this 15-dimensional feature data set is not able to well represent the pattern of this type of attack. Because the important feature of U2R attack is different from the features of the other attacks. So we should come up with a new method to deal with the classification about U2R attack. Acknowledgments The authors would like to thank the editorial board and reviewers. This paper is supported by the National Key Research and Development Program of China (Grant nos. 2016YFB0800602, 2016YFB0800604), the National Natural Science Foundation of China (Grant nos. 61472045, 61573067, 61771071), the “13th Five-Year” National Crypto Development Fund (Grant No.MMJJ20170122), and the Beijing City Board of Education Science and Technology Key Project (Grant no. KZ201510015015). References [1] Wang Z, Wang L, Szolnoki A, Perc M. Evolutionary games on multilayer networks: a colloquium. Eur Phys J B 2015;88(5):124. doi:10.1140/epjb/ e2015- 60270- 7.
40
L. Li et al. / Chaos, Solitons and Fractals 110 (2018) 33–40
[2] Wu SX, Banzhaf W. The use of computational intelligence in intrusion detection systems: a review. Appl Soft Comput 2010;10(1):1–35. doi:10.1016/j.asoc. 2009.06.019. [3] Chung YY, Wahid N. A hybrid network intrusion detection system using simplified swarm optimization (sso). Appl Soft Comput 2012;12(9):3014–22. doi:10. 1016/j.asoc.2012.04.020. [4] Feng W, Zhang Q, Hu G, Huang JX. Mining network data for intrusion detection through combining svms with ant colony networks. Fut Generat Comput Syst 2014;37:127–40. doi:10.1016/j.future.2013.06.027. [5] Lin S-W, Ying K-C, Lee C-Y, Lee Z-J. An intelligent algorithm with feature selection and decision rules applied to anomaly intrusion detection. Appl Soft Comput 2012;12(10):3285–90. doi:10.1016/j.asoc.2012.05.004. [6] Jiang S, Song X, Wang H, Han J-J, Li Q-H. A clustering-based method for unsupervised intrusion detections. Pattern Recognit Lett 2006;27(7):802–10. doi:10.1016/j.patrec.20 05.11.0 07. [7] Jain AK, Murty MN, Flynn PJ. Data clustering: a review. ACM Comput Surv 1999;31(3):264–323. doi:10.1145/331499.331504. [8] Nadiammai G, Hemalatha M. Effective approach toward intrusion detection system using data mining techniques. Egypt Inform J 2014;15(1):37–50. doi:10. 1016/j.eij.2013.10.003. [9] Al-Jarrah OY, Alhussein O, Yoo PD, Muhaidat S, Taha K, Kim K. Data randomization and cluster-based partitioning for botnet intrusion detection. IEEE Trans Cybern 2016;46(8):1796–806. doi:10.1109/TCYB.2015.2490802. [10] Buczak AL, Guven E. A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Commun Surv Tut 2016;18(2):1153–76. doi:10.1109/COMST.2015.2494502. [11] Zheng W, Fu X, Ying Y. Spectroscopy-based food classification with extreme learning machine. Chemometr Intell Lab Syst 2014;139:42–7. doi:10.1016/j. chemolab.2014.09.015. [12] Tsai C-F, Lin C-Y. A triangle area based nearest neighbors approach to intrusion detection. Pattern Recognit 2010;43(1):222–9. doi:10.1016/j.patcog.2009. 05.017. [13] Lin W-C, Ke S-W, Tsai C-F. Cann: an intrusion detection system based on combining cluster centers and nearest neighbors. Knowl Based Syst 2015;78:13–21. doi:10.1016/j.knosys.2015.01.009. [14] Aburomman AA, Reaz MBI. A novel svm-knn-pso ensemble method for intrusion detection system. Appl Soft Comput 2016;38:360–72. doi:10.1016/j.asoc. 2015.10.011. [15] Erfani SM, Rajasegarar S, Karunasekera S, Leckie C. High-dimensional and large-scale anomaly detection using a linear one-class svm with deep learning. Pattern Recognit 2016;58:121–34. doi:10.1016/j.patcog.2016.03.028. [16] Sheng Gan X, shun Duanmu J, fu Wang J, Cong W. Anomaly intrusion detection based on pls feature extraction and core vector machine. Knowl Based Syst 2013;40:1–6. doi:10.1016/j.knosys.2012.09.004. [17] Kuang F, Xu W, Zhang S. A novel hybrid kpca and svm with ga model for intrusion detection. Appl Soft Comput 2014;18:178–84. doi:10.1016/j.asoc.2014. 01.028. [18] Bostani H, Sheikhan M. Modification of supervised opf-based intrusion detection systems using unsupervised learning and social network concept. Pattern Recognit 2017;62:56–72. doi:10.1016/j.patcog.2016.08.027. [19] Wang Z, Moreno Y, Boccaletti S, Perc M. Vaccination and epidemics in networked populations introduction. Chaos Solitons Fract 2017;103(Supplement C):177–83. doi:10.1016/j.chaos.2017.06.004. [20] Ambusaidi MA, He X, Nanda P, Tan Z. Building an intrusion detection system using a filter-based feature selection algorithm. IEEE Trans Comput 2016;65(10):2986–98. doi:10.1109/TC.2016.2519914. [21] Mitra P, Murthy CA, Pal SK. Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 2002;24(3):301–12. doi:10.1109/34. 990133. [22] de la Hoz E, de la Hoz E, Ortiz A, Ortega J, Martínez-Álvarez A. Feature selection by multi-objective optimisation: application to network anomaly detection by hierarchical self-organising maps. Knowl Based Syst 2014;71:322–38. doi:10.1016/j.knosys.2014.08.013.
[23] Wang D, Nie F, Huang H. Feature selection via global redundancy minimization. IEEE Trans Knowl Data Eng 2015;27(10):2743–55. doi:10.1109/TKDE.2015. 2426703. [24] Lee J, Chang K, Jun C-H, Cho R-K, Chung H, Lee H. Kernel-based calibration methods combined with multivariate feature selection to improve accuracy of near-infrared spectroscopic analysis. Chemometr Intell Lab Syst 2015;147:139– 46. doi:10.1016/j.chemolab.2015.08.009. [25] Kim G, Lee S, Kim S. A novel hybrid intrusion detection method integrating anomaly detection with misuse detection. Expert Syst Appl 2014;41(4, Part 2):1690–700. doi:10.1016/j.eswa.2013.08.066. [26] Aldwairi M, Khamayseh Y, Al-Masri M. Application of artificial bee colony for intrusion detection systems. Secur Commun Netw 2015;8(16):2730–40. doi:10. 1002/sec.588. [27] Rodriguez A, Laio A. Clustering by fast search and find of density peaks. Science 2014;344(6191):1492–6. doi:10.1126/science.1242072. [28] Liang HW, Chung WH, Kuo SY. Coding-aided k-means clustering blind transceiver for space shift keying mimo systems. IEEE Trans Wireless Commun 2016;15(1):103–15. doi:10.1109/TWC.2015.2467394. [29] Kumar KM, Reddy ARM. A fast dbscan clustering algorithm by accelerating neighbor searching using groups method. Pattern Recognit 2016;58:39–48. doi:10.1016/j.patcog.2016.03.008. [30] Chen C, Mabu S, Shimada K, Hirasawa K. Network intrusion detection using class association rule mining based on genetic network programming. IEEJ Trans Electr Electron Eng 2010;5(5):553–9. doi:10.1002/tee.20572. [31] Panda M, Abraham A, Patra MR. Hybrid intelligent systems for detecting network intrusions. Secur Commun Netw 2015;8(16):2741–9. doi:10.1002/sec.592. [32] Chen M, Li L, Wang B, Cheng J, Pan L, Chen X. Effectively clustering by finding density backbone based-on knn. Pattern Recognit 2016;60:486–98. doi:10. 1016/j.patcog.2016.04.018. [33] Du M, Ding S, Jia H. Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl Based Syst 2016;99:135– 45. doi:10.1016/j.knosys.2016.02.001. [34] Zhu D, Premkumar G, Zhang X, Chu C-H. Data mining for network intrusion detection: a comparison of alternative methods∗ . Decis Sci 2001;32(4):635–60. doi:10.1111/j.1540-5915.20 01.tb0 0975.x. [35] Choi S, Ghinita G, Lim HS, Bertino E. Secure knn query processing in untrusted cloud environments. IEEE Trans Knowl Data Eng 2014;26(11):2818–31. doi:10. 1109/TKDE.2014.2302434. [36] Patra BK, Nandi S, Viswanath P. A distance based clustering method for arbitrary shaped clusters in large datasets. Pattern Recognit 2011;44(12):2862–70. doi:10.1016/j.patcog.2011.04.027. [37] Tang Y, Jing L, Li H, Atkinson PM. A multiple-point spatially weighted k-nn method for object-based classification. Int J Appl Earth Obs Geoinf 2016;52:263–74. doi:10.1016/j.jag.2016.06.017. [38] Zhang Y, Chen S, Yu G. Efficient distributed density peaks for clustering large data sets in mapreduce. IEEE Trans Knowl Data Eng 2016;28(12):3218–30. doi:10.1109/TKDE.2016.2609423. [39] Kaneko H, Funatsu K. Data density-based fault detection and diagnosis with nonlinearities between variables and multimodal data distributions. Chemometr Intell Lab Syst 2015;147:58–65. doi:10.1016/j.chemolab.2015.07. 016. [40] Wang S, Wang D, Li C, Li Y, Ding G. Clustering by fast search and find of density peaks with data field. Chin J Electron 2016;25(3):397–402. doi:10.1049/cje. 2016.05.001. [41] Dong A, l Chung F, Deng Z, Wang S. Semi-supervised svm with extended hidden features. IEEE Trans Cybern 2016;46(12):2924–37. doi:10.1109/TCYB.2015. 2493161. [42] Wu J, Yang H. Linear regression-based efficient svm learning for large-scale classification. IEEE Trans Neural Netw Learn Syst 2015;26(10):2357–69. doi:10. 1109/TNNLS.2014.2382123.