Available online at www.sciencedirect.com
Procedia Technology 6 (2012) 996 – 1003
2nd International Conference on Communication, Computing & Security [ICCCS-2012]
Hybrid Approach for Detection of Anomaly Network Traffic using Data Mining Techniques Basant Agarwal*, Namita Mittal Department of Computer Engineering, Malviya National Institute of Technology Jaipur-302016, India
Abstract Anomaly based Intrusion Detection System (IDS) is getting popularity due to its adaptability to the changes in the behavior of network traffic as it has the ability to detect the new attacks. As it is very difficult to set any predefined rule for identifying correctly attack traffic since there is no major difference between normal and attack traffic. In this paper, Anomaly traffic detection system based on the Entropy of network features and Support Vector Machine (SVM) are compared. Further, a hybrid technique that is combination of both entropy of network features and support vector machine is compared with individual methods. DARPA Intrusion Detection Evaluation dataset is used in order to evaluate the methods. It is proved that entropy based detection technique is capable of identifying anomalies in network better than support vector machine based detection system. In addition, hybrid approach outperforms entropy and SVM based techniques. © 2012 Published by Elsevier Ltd. © 2012 Elsevier and/or peer-reviewofunder responsibility the Department Science & Institute Selection and/or Ltd...Selection peer-review under responsibility the Department of of Computer Scienceof & Computer Engineering, National Engineering, National Institute of Technology Rourkela of Technology Rourkela Keywords: Anomaly detection; data mining; support vector machine;
1. Introduction Securing network is increasingly becoming of great significance. Intrusion detection provides mechanism for securing networks. Intrusion Detection System (IDS) mainly uses two types of techniques, signature based intrusion detection system and anomaly based (Denning, D.E. 1987). Signature based IDS uses predetermined and pre-configured rules or signatures to identify traffic as attack traffic or legitimate traffic and second is
* Corresponding author. Tel.: +91-9829013908 E-mail address:
[email protected],
[email protected]
2212-0173 © 2012 Elsevier Ltd...Selection and/or peer-review under responsibility of the Department of Computer Science & Engineering, National Institute of Technology Rourkela doi:10.1016/j.protcy.2012.10.121
Basant Agarwal and Namita Mittal / Procedia Technology 6 (2012) 996 – 1003
anomaly based intrusion system, it refers to the problem of finding patterns in the traffic data that do not behave as expected and alarms an attack if there is abnormal behavior in the traffic pattern. Problem with signature based intrusion detection system is that it can only detect the attacks of which it has rules. On the contrary anomaly based intrusion detection system can detect a new attack with the assumption that at the time of attack, network behavior changes (Q.A. Tran et al., 2004). Over time, a variety of anomaly detection techniques have been developed in several research communities (Varun Chandola et al., July 2009). Network Entropy is a measure of uncertainty or disorder in the network. Network entropy can be used for identifying abnormal behavior in the network. In this paper, Entropy of different network features are calculated and observed that at the time of attack, entropy values of these network features deviates from fixed predefined range. It has been observed that using entropy values, it could detect attacks but with high false alarm rate. Support Vector Machine (SVM) is one of the best classification techniques (C. Cortes and V. Vapnik.1995, V. Vapnik, 1995). In this paper, Support vector machine is used for creating classifier that can assign network traffic in one of two classes that is normal traffic and attack traffic. SVM model is build using different network features. In this paper, a hybrid method is also proposed for anomaly detection, combining both the techniques that are entropy and support vector machine. Firstly, Normalized entropy values of different network features are calculated. Then SVM model is trained in order to classify the normal traffic vs. attack traffic. To understand and evaluate the anomaly traffic detection techniques, second week of traffic data provided by MIT Lincoln Laboratory is used (DARPA, 1999). The remainder of this paper is as follows. Section 2 discusses related research that uses entropy and data mining techniques for anomaly detection, Section 3 and section 4 explains the entropy based method and support vector machine based method for detecting anomalies in network traffic respectively. Section 5, describes the proposed hybrid method for anomaly detection. Performance evaluation and Results are included in Section 6. Finally, this paper is concluded in Section 7. 2. Related Work Anomaly based intrusion detection system uses entropy values of different network features and different data mining techniques (Wenke Lee and Dong Xiang. 2001, Yoshida, K., Aug. 2003). The entropy of different network feature attributes is observed under normal and abnormal network conditions (G. Nychis et al., 2008). The network traffic is affected by different attacks, such as Denial of service attack, port scanning, etc. Further work on finding out whether the entropy values of different attributes are highly correlated is given by G. Nychis et al. (2008). Altyeb Altaher et al. (28-30 Oct. 2011) propose an online anomaly detection method based on relative entropy. They captured network traffic features using packet sniffing tool and used relative entropy to find out whether the network is under attack. Two entropy methods, network entropy and normalized relative network entropy have been developed to analyze network behaviors (Qian Quan et al., 2009). Radial-basis function neural network (RBFNN) and support vector machines (SVM) is applied to solve the Denial of Service (DoS) problem and compared which is a better technique for detecting DoS attacks (Tsang, G.C.Y. et al., 2004). Data mining based intrusion detection system mainly uses classification, nearest neighborhood, and clustering algorithm (H. Yang, et al., 2006, Varun Chandola et al., July 2009). Varun Chandola et al., (July 2009) provides survey of anomaly detection covering details of these techniques. A classification method is proposed based on SVM with a weighted voting schema for detecting intrusions. Entropy, TF (term frequency) and TF-IDF (term frequency and inverse document frequency) features are extracted from processes, then these are sent to the SVM model for training of classifier. Finally, a voting scheme is used for detecting intrusions (R. C. Chen and S. P. Chen, 2008). Keunsoo Lee et al., ( 2008) compute entropy values of different network features to analyze the behavior of network traffic and detected each phase of the attack using cluster analysis
997
998
Basant Agarwal and Namita Mittal / Procedia Technology 6 (2012) 996 – 1003
method. Q.A. Tran et al., (2004) propose anomaly network traffic detection based on Tcpstat and One-class SVM. In which, the Tcpstat collects network traffic, and its outputs are sent to One-class SVM for training the normal traffic behavior. At the time of attack, any deviation from the normal traffic raises the alarm indicating attack.
3. Entropy based Intrusion Detection System Entropy is a measurement of uncertainty or randomness. Entropy of different network features are calculated to create anomaly based intrusion detection system. If the entropy of network features deviate from a predefined range that indicates the randomness or abnormality in the network traffic that is anomaly in the network traffic. The value of network entropy tends towards 1 when there is high level of uncertainty in the traffic. Like in the Distributed Denial of Service (DDoS) attack period, entropy value of source IP address will ain. Similarly, entropy of destination IP address will also decrease as the destination IP will be of the victims system for most of the packets at that time (Keunsoo Lee et al., April 2008). Hence, the entropy values of source and destination IP addresses can be good measures. Similarly the entropy values of source and destination port numbers can be used to detect anomalies source and destination IP will increase (Keunsoo Lee et al., April 2008). In addition, the entropy value of packet type is worth observing because attacker uses specific packet type such as ICMP packet in ICMP flood attack and UDP packet in UDP flood attack. If the entropy of packet type converges to a small value, it indicates an attack (Keunsoo Lee et al., April 2008). Entropy of packet size of each packet is also an important network feature for identifying intrusion as in attack duration, number of similar size packets exceed than the number of types of packets in normal traffic condition; hence entropy of packet size distribution decreases during the attack Ping (Du ,Shunji Abe, Dec. 2007). 3.1 Network feature extraction: Algorithm shown in Figure 1 is used to calculate the following network features. Normalized entropy of packet size. Normalized entropy of source IP address Normalized entropy of source port number. Normalized entropy of destination IP address Normalized entropy of destination port number. Normalized entropy of packet type (ICMP, TCP, and UDP). During time of attack, normalized entropy values will deviate from its normal behavior, and that is used for detecting the anomaly in the traffic. Threshold is predefined for entropy values to detect deviation from normal behavior i.e. attack traffic.
Basant Agarwal and Namita Mittal / Procedia Technology 6 (2012) 996 – 1003
1. Input Network traffic 2. Output Normalized entropy for each network feature 3. loop: each time interval //till the traffic comes 3.1 Extract features from packet header (for example: source IP). 3.2 loop: for each packet in the time interval 3.2.1 Calculate frequency of all distinct source IP end 3.3 Loop: for each distinct IP 3.3.1 Calculate probability for each distinct source IP address Pi = mi/T Here, mi = frequency of ith source IP T= total number of packets in that time interval 3.3.2 Calculate entropy for each distinct IP address hi= - pi log pi end 3.4 Normalize the entropy, in the time interval by H = hi/log(F) // F is total number of distinct source IP address. END
Figure 1: Algorithm for calculating normalized entropy
4. Support Vector Machine Based Anomaly Detection Support vector machine is one of the useful classification techniques (Thair Nu Phyu, March, 2009). SVM model is learned using different network features for classifying the normal traffic and attack traffic. After training of SVM model using network features, it will be able to predict whether the traffic falls into the one category that is attack traffic or the normal traffic. 4.1 Network feature extraction Following network features are used. Number of distinct source IP address Number of distinct source port number. Number of distinct destination IP address Number of distinct destination port number. Number of distinct packet type (ICMP, TCP, and UDP). Number of distinct packets with same packet size. These network features are calculated from the network in every 60 seconds using libpcap library (LIBPCAP), then these values are normalized. Normalized values of network features are used for learning the SVM model, and that SVM model can be used to detect changes in the behavior or anomalies of the network.
999
1000
Basant Agarwal and Namita Mittal / Procedia Technology 6 (2012) 996 – 1003
5. Hybrid Method for Anomaly Detection A hybrid approach is used for detecting anomalies in the network which is a combination of both entropy and support vector machine methods. By combining the entropy of network feature and support vector machine benefits of both the techniques are used and demerits of both the techniques can be removed. Entropy based techniques has the advantage that it can better represent the properties of the network traffic and support vector machine is good for classification. Support vector machine with direct network features does not give good results. In this hybrid method, firstly normalized entropy of network features are calculated using the algorithm given in Figure 1, then these normalized entropies are sent to SVM model for learning the behavior of the network. Now this trained SVM model can classify the network traffic in attack traffic or legitimate traffic. Overview of hybrid anomaly intrusion detection system is shown in Figure 2.
Figure 2: Overview of hybrid approach
6. Performance Evaluation and Results DARPA Intrusion Detection Evaluation dataset has been used for evaluation of our technique provided by Massachusetts Institute of Technology, Lincoln Lab (http://www.ll.mit.edu/IST/ideval/) . Second week of the data provided by them contains different attack and legitimate traffic, which is used for evaluation. Attack that has anomaly signature is considered for evaluating the ability of the anomaly based intrusion detection system. These attacks are portsweep, mailbomb, ipsweep, satan, and Neptune. In this paper, correctly classified term is used for true positives and false negatives for the traffic. Similarly, misclassified term is used for true negatives and false positives. In entropy based anomaly detection system, firstly normalized entropy of network traffic features is calculated in every 60 seconds. Threshold value is fixed for each feature for identifying the anomalies based on experiments. Then voting system for each feature decides whether there is an attack or not. This method is able to produce good results in case of detecting attack traffic but it also produces high false alarms, because
Basant Agarwal and Namita Mittal / Procedia Technology 6 (2012) 996 – 1003
the entropy values can also be deviate from the range or towards 0 or 1 in case of legitimate traffic. For example, during exams in the university, a lot of students will use network to access a lot of different websites. As in this case, e deviation of entropy values from fixed range. Entropy based method in itself is not effective for detecting anomalies in the traffic; also it is not much adaptive to the behavior changes in the network and it has fixed threshold values.
Figure 3: Entropy values of different network parameters
Experimental results of entropy based system are 85.71% classified correctly and 14.29% misclassified
1001
1002
Basant Agarwal and Namita Mittal / Procedia Technology 6 (2012) 996 – 1003
instances as shown in Table 1. In Figure 3, graphs for entropy values of different network features with respect to time is given, Figure 3 shows that at the time of attack entropy values deviated from fixed range in red circles. These graphs are for Monday evaluation of intrusion detection. In support vector machine based anomaly detection system, network features, given in section 4.1 are used for training the SVM model. Then this SVM model is used for classifying normal traffic and attack traffic. Experimental results shows that using this method correctly classified instances is 77.71% and misclassified instances are 22.29% as shown in Table 1, which is very less as compare to other methods. SVM based method is also not very adaptive to changes in the network because network features used for training the model is not able to represent the properties of the network features wholly. Hybrid approach exploits the benefits of both the techniques i.e. entropy based and support vector machine based. Entropy based method has the benefit of representing the properties of the network and support vector machine has the advantage of good classification. Hybrid anomaly detection system learns the behavior of network traffic from the normalized entropy values of different network features. This hybrid system outperforms entropy and support vector machine methods. It reduces the false alarms and increases detection rate of anomalies. Experimental results shows that this hybrid technique gives 97.25% correctly classified instances and 2.75% misclassified instances as provided in Table 1. In hybrid approach, both the methods complement each other and outperform other techniques. Table 1: Results of each anomaly based detection systems Entropy Based
SVM Based
Hybrid Method
Correctly Classified
85.71%
77.71%
97.25%
Misclassified
14.29%
22.29%
2.75%
7. Conclusion Firstly two anomaly based intrusion detection system that is entropy based and support vector machine based are experimented with important network features. Then hybrid method (based on both) is proposed. It is observed that Hybrid anomaly detection system is a good combination of entropy and SVM for detecting anomaly network traffic. As problem with simple entropy based method, is fixed threshold range for entropy is set/fixed for identifying anomaly detection. This method is not very dynamic to decide whether there is an attack, because entropy values can deviate from the fixed range in normal conditi false alarm rate. Support vector machine based model alone does not give very good results as network features are used for learning without processing. Experimental result demonstrates that this hybrid method works well with high accuracy for detection of attack traffic and less false alarms. In future work, anomaly based model can be analyzed with network traffic more effectively by extracting more network features and develop a more robust and dynamic detection algorithm. It may be useful to apply the proposed method with various attacks and data sets. References Altyeb Altaher, Sureswaran Ramadass, Bhavani Thuraisingham, Mohammad Mehedy, 28-Line anomaly Detection Network and Multimedia Technology (IC-BNMT), 2011 4th IEEE International Conference, pp 33-36. ademic Publishers, Boston. pp: 273- 297. C. Cortes and V. Vapnik. 1995, DARPA Intrusion Detection Evaluation Data Sets 1999, available at http://www.ll.mit.edu/IST/ideval/data/data\index.html. Denning, D.E. An Intrusionons., SE-13, Issue: 2 pp 222-232.
1003
Basant Agarwal and Namita Mittal / Procedia Technology 6 (2012) 996 – 1003
SIGCOMM conference on Internet measurement,, ACM Press,,pp 151-156. H. Yang, F. Xie, international conference on Fuzzy Systems and Knowledge Discovery Pages 1082-1091. Keunsoo Lee, Juhyun Kim, Ki Hoon Kwon, Younggoo Han, Sehun Ki - 1665. LIBPCAP. http://www.tcpdump.org/index.html -Inspired Models of Network, Information and Computing Systems. Bionetics. pp- 93-96. -class Support Vector Machine for Anomaly Network Traffic Detecti Q.A. Tran, H. Duan, and X. Li, 2004. Research Workshop of the 18th APAN, Cairns, Australia. Qian Quan, Che Hong-Yi, Zhang Rui, 2009, International Symposium on Dependable Computing,, pp. 189 - 191 -IDF. International Journal of Innovative Computing, Information and Control (IJICIC) Vol. 4, Number 2, pp. 413-424. Thair Nu Phyu, March 18 Conference of Engineers and Computer Scientists 2009 Vol (I)- IMECS 2009, Hong Kong. Tsang, G.C.Y.; Chan, P.P.K.; Yeung, D.S.; Tsang, E.C.C., 26-29 Aug. 2004, "Denial of service detection by support vector machines and radial-basis function neural network," Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on , vol.7, no., pp. 4263- 4268 vol.7,. Varun Chandola, Arindam Banerjee, vipin kumar, July 2009, (CSUR), Volume 41 Issue 3. Springer-Verlag: New York. V. Vapnik, 1995, Wenke Lee , Dong Xiang, May 14-16, 2001, Information-Theoretic Measures for Anomaly Detection , Proceedings of the 2001 IEEE Symposium on Security and Privacy, p.130. Yoshida, K., Aug. 2003, "Entropy based intrusion detection," Communications, Computers and signal Processing, 2003. PACRIM. 2003 IEEE Pacific Rim Conference on , vol.2, pp. 840- 843.