A-GHSOM: An adaptive growing hierarchical self organizing map for network anomaly detection

A-GHSOM: An adaptive growing hierarchical self organizing map for network anomaly detection

J. Parallel Distrib. Comput. 72 (2012) 1576–1590 Contents lists available at SciVerse ScienceDirect J. Parallel Distrib. Comput. journal homepage: w...

980KB Sizes 0 Downloads 38 Views

J. Parallel Distrib. Comput. 72 (2012) 1576–1590

Contents lists available at SciVerse ScienceDirect

J. Parallel Distrib. Comput. journal homepage: www.elsevier.com/locate/jpdc

A-GHSOM: An adaptive growing hierarchical self organizing map for network anomaly detection Dennis Ippoliti, Xiaobo Zhou ∗ Department of Computer Science, University of Colorado, Colorado Springs, 1420 Austin Bluffs Parkway, Colorado Springs, CO 80918, USA

article

info

Article history: Received 13 October 2011 Received in revised form 9 August 2012 Accepted 5 September 2012 Available online 14 September 2012 Keywords: Network anomaly detection Growing hierarchical self organizing map Online adaptation Detection accuracy False positive rate

abstract The growing hierarchical self organizing map (GHSOM) has been shown to be an effective technique to facilitate anomaly detection. However, existing approaches based on GHSOM are not able to adapt online to the ever-changing anomaly detection. This results in low accuracy in identifying intrusions, particularly ‘‘unknown’’ attacks. In this paper, we propose an adaptive GHSOM based approach (AGHSOM) to network anomaly detection. It consists of four significant enhancements: enhanced thresholdbased training, dynamic input normalization, feedback-based quantization error threshold adaptation, and prediction confidence filtering and forwarding. We first evaluate the A-GHSOM approach for intrusion detection using the KDD’99 dataset. Extensive experimental results demonstrate that compared with eight representative intrusion detection approaches, A-GHSOM achieves significant overall accuracy improvement and significant improvement in identifying ‘‘unknown’’ attacks while maintaining low false-positive rates. It achieves an overall accuracy of 99.63%, and 94.04% accuracy in identifying ‘‘unknown’’ attacks while the false positive rate is 1.8%. To avoid drawing research results and conclusions solely based on experiments with the KDD dataset, we have also built a dataset (TD-Sim) that consists of a mixture of live trace data from the Lawrence Berkeley National Laboratory and simulated traffic based on our testbed network, ensuring adequate coverage of a variety of attacks. Performance evaluation with the TD-Sim dataset shows that A-GHSOM adapts to live traffic and achieves an overall accuracy rate of 97.12% while maintaining the false positive rate of 2.6%. © 2012 Elsevier Inc. All rights reserved.

1. Introduction An anomaly detection system attempts to detect intrusions by noting significant departures from the normal behavior [8,35]. It directly addresses the problem of detecting ‘‘unknown’’ attacks against systems. This is due to the fact that anomaly detection techniques do not scan for specific behaviors. They compare current activities against models of past behavior. By doing so, they are able to identify an unusual behavior and flag it as an intrusion attempt. Machine learning approaches have been used for network anomaly detection [1–3,10,14,15,23]. In particular, approaches based on self-organizing maps (SOMs) of artificial neural networks have shown effectiveness at identifying ‘‘unknown’’ attacks [19–21,30,31]. Those studies focused on the training process of SOMs [19], the exploration of training data composition [20,31], and proper feature selection for training sets and the effect of map topology [20]. However, the effectiveness of using traditional SOM models is limited by the static nature of the model architecture.



Corresponding author. E-mail address: [email protected] (X. Zhou).

0743-7315/$ – see front matter © 2012 Elsevier Inc. All rights reserved. doi:10.1016/j.jpdc.2012.09.004

The size and dimensionality of the SOM model is fixed prior to the training process and is determined by trial and error [20,31]. There is no systematic method for identifying an optimal configuration. One approach that attempts to overcome this issue is the growing hierarchical self organizing map (GHSOM) [13,29]. GHSOM is an SOM model that does not use a predetermined map topology. Instead, the size and the dimensionality of the map dynamically grow during the training process to optimally fit the training set based on user defined parameters. It leads to a hierarchy of traditional SOMs of various sizes. The main advantage of a GHSOM over a traditional SOM is that trial and error are eliminated from the training process. An ideal topology is formed unsupervised based on the training data. Additionally, hierarchal relationships in the training data are discovered and modeled in the final configuration. However, although GHSOM has shown some promise in network intrusion detection [29], it does not account for concept drift. That is, although the topology is modeled to fit the training set, it is not adapted online to account for changes to live data that occur over time. There is no capability to adapt the model as live data is processed. Over time, subtle changes in legitimate traffic patterns as well as vulnerability to previously ‘‘unknown attacks’’ require updates to maintain accuracy.

D. Ippoliti, X. Zhou / J. Parallel Distrib. Comput. 72 (2012) 1576–1590

In this paper, we propose a novel adaptive growing hierarchical self organizing map (A-GHSOM) that adapts online to changes in the input data over time. We do this with four significant enhancements. First, we develop a threshold based training process. The training process expands the GHSOM model to fit the training set. Instead of using the mean quantization error as a control parameter, we establish a new parameter of threshold error value that is more suitable to the network intrusion detection domain. Second, we use a dynamic input normalization process that monitors the range of observed values of input connections and uses the information to adapt the map scale during normalization online. Third, we apply feedback-based quantization error threshold adaptation. Quantization error is a measure of how closely an evaluated connection fits its matched neuron in the map. We use an error threshold that adapts over time to identify new attacks that may be initially matched to ‘‘normal’’ nodes in a GHSOM but are indeed malicious. Finally, we design a prediction confidence filtering and forwarding mechanism to identify traffic patterns that are beyond the capability of a content-oblivious system to evaluate. These connections can be filtered or forwarded to a content-aware intrusion detection system for further evaluation. We conduct experiments using the KDD’99 dataset [17,24]. Many recent important studies in [16,23,20,29,38] have used the dataset for network intrusion detection study since it is a major publicly available dataset. We use the dataset for performance comparison of A-GHSOM with the related approaches. Our baseline A-GHSOM approach achieves 92.21% accuracy and 0.58% false positives. Applying those four significant enhancements, the accuracy of the integrated A-GHSOM approach increases to 99.63% while the false positive rate is just slightly increased to 1.8%. It is able to identify a subset of connections that are virtually indistinguishable based on the data available in the KDD’99 dataset. By identifying those connections, it increases the accuracy on detecting previously ‘‘unknown’’ attacks to 94.04%. When examining individual attack categories of Denial of Service (DoS), Probe, Remote to Local (R2L), and User to Root (U2R) it achieves accuracy of 99.82%, 99.58%, 92.66% and 87.14% respectively. Overall, compared to eight representative intrusion detection approaches, the A-GHSOM approach significantly increases the detection accuracy with a low false positive rate. To avoid drawing research results and conclusions solely based on experiments with the KDD’99 dataset, we have also built a dataset (TD-Sim) that consists of a mixture of live trace data from the Lawrence Berkeley National Laboratory and simulated traffic based on a testbed network ensuring adequate coverage of a variety of network attacks. Performance evaluation with the TD-Sim dataset again shows that A-GHSOM effectively adapts to live traffic. It achieves an overall accuracy rate of 97.12% while maintaining the false positive rate of 2.6%. The novelty and significance of the proposed A-GHSOM technique is that it is able to adapt to dynamic changes during online monitoring and effectively deal with unknown attacks compared to the related techniques. The structure of this paper is as follows. Section 2 reviews related work. Section 3 presents the GHSOM model. Section 4 through Section 7 give the design of the four major technical enhancements, threshold-based training process, dynamic input normalization, feedback-based threshold adaptation, and confidence filtering and forwarding. Section 8 focuses on the performance evaluation using the KDD’99 dataset. Sections 9 and 10 introduce the TD-Sim dataset and the performance evaluation using the new dataset, respectively. Section 11 concludes the paper with remarks on future work.

1577

2. Related work Network intrusion detection has been an active area of research and development since 1987 [11]. There are generally two types of intrusion detection systems: anomaly detection and misuse detection. Misuse detection techniques attempt to model attacks on a system as specific patterns, and then systematically compare these patterns with an established rule set [6]. The process involves a specific encoding of previous behaviors and actions that were deemed intrusive or malicious. One advantage of misuse detection is that known attacks can be detected with significant reliability and with a low false positive rate. However, they are not very effective at detecting attacks that have not previously been clearly defined and added to the rule set. Many approaches have used machine learning techniques to address the issue of anomaly detection. For instance, K -nearest neighbor techniques were used to compute the approximate distance between different data points with vectors being classified by a majority vote of its neighbors [3,14,23]. Naive Bayes networks were used to calculate statistical probabilities that certain features can indicate traffic classification [32,33]. Decision tree methods were used to classify a sample through a sequence of decisions used to influence future decisions [3,12]. Support vector machine approaches were used for examining the results of intrusion detection approaches [9,14,20,34,36]. Neural Network approaches use computational models that try to simulate functional aspects of biological neural networks. The first use of a Kohonen self-organizing map in intrusion detection was described by Cannady and Mahaffey [6]. Since then, approaches based on SOMs have shown effectiveness at identifying both known and ‘‘unknown’’ attacks [19–21,30,31]. For instance, Sarasamma et al. [31] used a hierarchy network built on an SOM architecture with no neighborhood or transfer functions. Experiments were conducted with a variety of random training sets. Five training sets were used, each one consisting of a different distribution of connection classes. Each layer of the hierarchy was trained on individual and exclusive feature sets. Kayacik et al. [20] examined several approaches related to the application of SOMs to intrusion detection. Experiments were conducted with three different partitions of the training data, the entire 10% set, normal only connections, and a filtered set consisting of equal numbers of attack vs. normal connections. It also compared the effectiveness of using only 6 basic features to using all 41 features available in the sample data. Networks of differing complexity were tested. In those traditional SOM based approaches, the size and dimensionality of the model is fixed prior to the training process. Palomo et al. [29] used a GHSOM to overcome the limitation. A method for calculating quantization error based on both numeric and symbolic data was proposed. The map is adapted only during training. Once training is complete and live data is applied, no adaptation or new learning occurs. All of the surveyed studies are content oblivious. That is, they examine information related to network connections, but not the content of the data. Our A-GHSOM approach is also content oblivious. However, our approach differs from the existing approaches in four important ways. First, we design a threshold based training process. Instead of using the mean quantization error as a control parameter, we establish a new parameter threshold error value that is more suited to the intrusion detection problem. In [29], the mean quantization error is based on Euclidean distance and is the primary growth metric. That metric does not account for single value anomalies in the connection data. In cases where most data values are similar and only one value is anomalous, the mean quantization error may fail to capture this anomaly. However, the threshold error value is still effective for anomaly detection.

1578

D. Ippoliti, X. Zhou / J. Parallel Distrib. Comput. 72 (2012) 1576–1590

The second major difference is the addition of online adaptation for input normalization. Previous SOM-based approaches use offline training followed by online testing. Although various training sets and feature vectors are used during an unsupervised learning phase, once training is complete, the testing set is presented to the network for classification as it is. Adaptations are not made during the testing phase. Our approach uses statistical data gathered during testing to perform dynamic input normalization and increase accuracy. The third enhancement is the use of operator feedback to further adapt the model online. In [38], the authors developed a system for using operator feedback to tune prediction rules. Our approach also assumes that such a mechanism exists. However, we recognize that the corrective feedback will be limited. Our approach examines varying feedback probability rates, which are the likelihoods that an operator would notice an incorrect prediction and provide feedback to the model. Finally, although our approach does not examine the content of connections, it is able to identify connection patterns that indicate further examination is required to classify the connection. We have observed that several connections in the KDD dataset have identical vectors, however, not identical connection types. There is a subset where normal connections and attack connections are indistinguishable if only the values in the KDD dataset are considered. Our approach calculates a node confidence score and uses this score to identify potential connections to forward to an operator or content-aware system for further processing. We classify connections as ‘‘normal’’, ‘‘attack’’, or ‘‘forward’’. There are many other works on network intrusion detection. For example, Gupta et al. [16] recently addressed accuracy and efficiency using conditional random fields and a layer approach. Yu et al. [38] designed an automatically tuning intrusion detection system, which controls the number of alarms output to the system operator and tunes the detection model on-the-fly according to feedback provided by the system operator when false predictions are identified. In this paper, we only compare AGHSOM with its closely related techniques.

Fig. 1. A GHSOM architecture after off-line training.

online learning process using dynamic input normalization and feedback-based quantization error thresholds to adapt the system over time to changes in the input data. It also uses a confidence filtering and forwarding mechanism to identify traffic patterns that are beyond the capability of a content-oblivious system. In the following, Sections 4–7 give the design details for the four important components of the new A-GHSOM approach. 4. Threshold-based training process The threshold-based training process applies to off-line training. It is executed in two cycles, a learning cycle and a growth cycle. During the learning cycle, the GHSOM nodes represent the various input patterns and weights for existing nodes are adjusted. During the growth cycle, the GHSOM architecture grows both horizontally and vertically. New nodes are added to the structure. Two cycles continuously alternate until the training process is complete. During the learning cycle, each input pattern j is represented by an n-dimensional input vector Vj , Vj = vj1 , vj2 , . . . , vjn .





(1)

The initial map size is set to a single layer of 2 × 2. Each node j is assigned an n-dimensional weight vector Wj , which is initialized with random values as follows:

3. Growing hierarchical self organizing map

Wj = wj1 , wj2 , . . . , wjn .

In a GHSOM, the size and the dimensionality of the map architecture are determined during the training phase. Fig. 1 depicts a GHSOM architecture after off-line training. The initial map size is very small, usually a single layer 2 × 2. During the training process, the map grows both vertically and horizontally until the training process is complete. Two configurable parameters are used, δE and δD . δE represents the target quantization error for the map, and δD represents the maximum dimensionality of a single layer. After each training iteration, the deviation of the input data (quantization error) is computed. The map grows horizontally by adding rows and columns to the map to reduce quantization error. It grows vertically by adding child layers to parent layers that exceed the maximum dimensionality specified by δD . The process continues until the quantization error of the map is less than δE . It is important to note that, at the end of the training process, each layer and sub-layer can have a different number of maps and sub-maps with varying dimensionality. GHSOMs are used against high-dimensional data in neural networks [13] and in network intrusion detection [29]. In the work of [29], a GHSOM model with a new metric incorporating both numerical and symbolic data is proposed. An intrusion detection system monitors the IP packets flowing over the network to capture intrusions or anomalies. It builds statical models using metrics derived from observation of the users’ actions. In that work, the learning and adaptation is completed at the end of the off-line training process. Our A-GHSOM approach develops an

The major steps of the learning cycle are as follows:





(2)

1. An input pattern is selected from the training set and presented to the map. 2. Each node in the map is examined to find the best matching unit. 3. The weights of the best matching unit and nodes in its neighborhood are adjusted to make them close to the input vector. 4. Repeat the process until all input patterns are presented. The weight vectors of the best matching unit and its neighborhood are adjusted such that for each node j in the neighborhood, Wj is adjusted according to Wj = Wj + α Wj − Vj





(3)

where α represents the learning rate. Nodes in a single sublayer are added and arranged in rows and columns. We use a simple rectangular neighborhood scheme. The neighborhood of node k is defined by an enclosing square Nk , centered on node k and extending n nodes in each direction. During the weight adjustment, the initial size of the neighborhood (n) is set large (i.e. n = 10). The weights for each node in neighborhood Nk are adjusted according to Eq. (3). This adjustment process is repeated with an increasingly smaller neighborhood size until the neighborhood contains only the best matching unit, that is, Nk = {k}.

D. Ippoliti, X. Zhou / J. Parallel Distrib. Comput. 72 (2012) 1576–1590

1579

Table 1 Effect of scale on normalized distance. Minimum

Maximum

Value-1

Value-2

Normalized distance

1 1 1 1 1

16 32 64 128 256

8 8 8 8 8

14 14 14 14 14

0.4 0.194 0.095 0.047 0.023

in the sub-map containing the highest error node is compared to the growing threshold δD . If it is greater than δD , a new sub-map is created subordinate to the highest error node. If less, a new column or row of neurons is inserted between the highest error node and its most dissimilar neighbor. Their weight vectors are initialized as the mean of their neighbors’ weight vectors. 5. Dynamic input normalization

Fig. 2. Quantization error vectors of two sample connections.

After each learning cycle, the quantization error is analyzed and the topology of the GHSOM is adapted during a growing cycle. For each input pattern j, the quantization error vector QEj is presented as follows:

The dynamic input normalization process is designed to emphasize the true difference between individual input vectors. The idea is to normalize input patterns to values between 0 and 1 based on the scale values of the system.

QEj = qe1j , qe2j , . . . , qenj

5.1. Data normalization and the effect of scale





(4)

  where qekj = wjk − vjk  , k = 1, 2, . . . , n. The threshold error value is calculated using the quantization error vector as follows: TEVj =

n 

f (k)

(5)

k=1

f (k) =



qekj 0

if qekj > τ1 if qekj ≤ τ1 .

(6)

Here, τ1 is a parameter that controls how closely an element of an input vector must match an element of a weight vector before it is no longer factored into the threshold error value. The node with the highest threshold error value is considered the highest error node. The rationale behind Eq. (6) is that the input pattern of an anomalous connection often matches the input pattern of a normal connection very closely except for one or two parameters. Existing GHSOM models use parameters such as the mean quantization error of the vector, or Euclidean distance between vectors [13,29]. However, they would not be as effective as using the threshold error value given by Eq. (6). For example, consider three vectors representing two connections and one weight vector. Connection1 and Connection-2 are each compared to the weight vector. Fig. 2 shows the individual quantization errors for each data point in the vectors. If we were to use mean quantization error or Euclidian distance to evaluate Connection-1 and Connection2, we would find that the mean quantization error and Euclidian distance are higher for Connection-1. However, it is obvious that Connection-2 has an anomalous value in the vector. Instead, by using the threshold error value during training, the network learns to distinguish the difference between two vectors that are very similar in the global vector space but may be very different in one or more sub-spaces. For example, if τ1 were set to 0.1 or 0.15, according to Eqs. (5) and (6), the TEV for Connection-1 would be 0 while the TEV for Connection-2 would be 0.23. Thus, Connection-2 is flagged as an anomaly. For effective anomaly detection, we are interested in not only the aggregate small discrepancies, but also identifying large single discrepancies. Once the highest error node is identified, its threshold error value is compared to the growing threshold δE . If it is greater than δE , the map is expanded. During expansion, the number of nodes

In a GHSOM, data is normalized to account for differences in scale between two different data points. As an example consider that the two data points are height and weight, height being 5∼6 ft and weight being 100∼200 pounds. Weight would always take precedence over height if the data is not normalized. We propose to account for changes in the data scale over time. By dynamically updating the scale of the input data, we are able to highlight the true distance between different values of the same feature. This is particularly useful when identifying ‘‘anomalous’’ behavior. To capture the relative distance between two different values of the same feature, we adjust the input pattern so that all values are in the range of 0 to 1 using simple linear scaling for the normalization. We consider the ‘‘distance’’ to be how different these two values are from one another. The smaller the distance, the more similar the values are considered. In Table 1, the reference range used to normalize input values varies from 1∼16 to 1∼256. For each reference range, two values 8 and 14 are normalized based on the range and their normalized distance is calculated. When the reference range is 1∼16, two values are considered very different from each other as the normalized distance is 0.4. When the reference range is 1∼256, they are considered very similar as the normalized distance is 0.023. In an intrusion detection problem domain, there are several data points for which the scale will be known ahead of time. Some points are in the scale of 0∼1, while some others are in 0∼255. However, in the KDD’99 dataset, 17 of the 42 data points have an undefined scale. Furthermore, the scale in the training data is not the same as the scale in the live data. Therefore, scaling done during the training process will not be accurate when applied to the live data. A-GHSOM uses an adaptive input normalization approach that automatically tunes the scaling parameters and weight vectors online based on the observed minimum and maximum values in the examined patterns. 5.2. Dynamic input normalization process The KDD’99 data contains both numeric and symbolic values. In the dynamic input normalization process, numeric inputs are normalized to values between 0 and 1. Flag values are normalized to integer values greater than 1 so that the possible outcomes

1580

D. Ippoliti, X. Zhou / J. Parallel Distrib. Comput. 72 (2012) 1576–1590

(a) Examples of adapted maximum value.

(b) Examples of adapted minimum values.

Fig. 3. The relationship between the observed min/max values and the adapted min/max values.

of comparing two flag values are only differences equal to zero or greater than or equal to one. All other values are normalized according to the normalization function. When compared to the weight vector, differences close to 0 are considered extremely similar with 0 being identical, differences close to 1 are considered very different (differences ≥ 1 are considered infinitely different). The normalization function F (x) is given as follows: F (x) =

x − min(t )

 

0max(t ) − min(t )

max(t ) ̸= min(t ) max(t ) = min(t )

(7)

where x is a data point in the input pattern, min(t ) and max(t ) are the expected minimum and maximum values for that data point at time t respectively. Prior to training, the minimum and maximum values in the training set are calculated for each data point. These values are used to normalize the input patterns at time t. As the effect scale can have significant impact on identifying the variance, the ideal situation is to have the most narrow range possible to normalize the data and therefore amplify the true difference between vectors. Over time, if no input data is as low as the minimum, raise the minimum. If no data input is as high as the maximum, lower the maximum. As live traffic is processed, at periodic intervals from time t to time t + n, where n is the number of records processed, the observed minimum and maximum values are recorded as obsmin (t ) and obsmax (t ) respectively. At time t, the minimum and maximum values are adjusted according to the following equations. max(t + n) = min(t ) + R(t ) · e−α + β max(t ).

(8)

min(t + n) = max(t ) − R(t ) · e−α − β min(t ).

(9)





In Eqs. (8) and (9), we have max(t ) =





min(t ) = ∆

obsmax (t ) − max(t ) 0

min(t ) − obsmin (t ) 0



obsmax (t ) > max(t ) obsmax (t ) ≤ max(t ) obsmin (t ) ≤ min(t ) obsmin (t ) > min(t )

R(t ) = max(t ) − min(t ). Control parameters α and β have values between 0 and 1. Parameter α controls the exponential growth/decay rate. Parameter β controls how differences between the current used min/max and the observed min/max is used. Because the training data is only a subset of the possible values in the live data, any scaling done during training will not be accurate when applied to live data. Our model implements an adaptive input normalization process by dynamically adjusting each

data point’s scale. Fig. 3(a) and (b) demonstrate the relationship between the observed min/max values and the adapted min/max values. They represent a subset of the ‘‘test-bytes’’ feature of the KDD dataset. The x-axis represents the time interval in which the measurement was recorded. The periodic interval is 5000 connections. The y-axis represents the min or max value in bytes. The plotted series represent the relationship between the observed values and the adapted values with α = 0.05 and β = 1.0. The adapted values are used for scaling during the normalization process. 6. Feedback-based threshold adaptation A-GHSOM is further enhanced by use of feedback-based quantization error threshold adaptation. It adaptively adjusts thresholds for each node as input patterns are applied and add new nodes when appropriate. Each node is assigned two initial threshold parameters. τ1 is used to calculate the threshold error value. τ2 is used as an upper limit on the acceptable total quantization error and used to calculate the quantization error boundary (QEB) for a selected node as follows.

 QEBj =

0 1

tqej < τ2 tqej >= τ2

n

(10)

k where tqej = k=1 qej . During live intrusion detection, input patterns are applied to the A-GHSOM and the best matching unit is selected. The best matching unit is the node that best matches the input pattern. However, because of the size of the problem space, there is no guarantee that the best matching unit is a good match to the input pattern. The only guarantee is that it is a better match than all the other nodes in the neural network. The threshold error value and quantization error boundary are used to determine if the connection being processed is within thresholds. If and only if they are identically equal to 0, is the pattern considered within thresholds. Each node in the final A-GHSOM is marked either ‘‘normal’’, ‘‘unmarked’’, or ‘‘attack’’. As patterns are examined, they are mapped to one of the three node types and considered within thresholds or not. Results are then used to make predictions to identify the suspected connection type, either ‘‘normal’’ or ‘‘attack’’. Any pattern that is not within thresholds is identified as a suspected attack. Even though its best matching unit is labeled ‘‘normal’’, the fact that its threshold error value or its quantization error boundary is greater than 0 means this connection is anomalous in nature and it is assumed to be an attack. Likewise, any patterns mapped to a best matching unit

D. Ippoliti, X. Zhou / J. Parallel Distrib. Comput. 72 (2012) 1576–1590

1581

Table 2 Feedback-based threshold adaptation rules. Best matching unit type

Within thresholds

Prediction

Result

Adaptation rules

Attack Attack Attack Attack Normal Normal Normal Normal Unmarked Unmarked Unmarked Unmarked

YES NO YES NO YES NO YES NO YES NO YES NO

Attack Attack Attack Attack Normal Attack Normal Attack Attack Attack Attack Attack

Correct Correct Incorrect Incorrect Correct Correct Incorrect Incorrect Correct Correct Incorrect Incorrect

No action No action Grow network Grow network No action No action Lower τ1 , τ2 Raise τ1 , τ2 No action Mark node ‘‘attack’’ Mark node ‘‘normal’’ Mark node ‘‘normal’’ & raise τ1 , τ2

of type ‘‘unmarked’’ are patterns that have not been previously identified and are suspected attack. The actual connection type is then compared to the suspected connection type and a result of ‘‘correct’’ or ‘‘incorrect’’ is identified. The distinction of correct or incorrect is made based on operator feedback. It is assumed that to some degree, a network operator will be able to identify that an attack prediction was actually a false positive or that an attack was missed. We realize that it is not reasonable to expect the operator to respond to every error made in the system. In the absence of feedback, the system assumes that its prediction was correct. Table 2 gives the feedback-based adaptation rules. 7. Confidence filtering and forwarding

j

λ(Attackjt −1 + PREDICTAttack ) λ+1

λ(Attackjt −1 + PREDICTNormal ) λ+1 max ( Attack t , Normalt ) j Consistencyt = Attackt + Normalt j

Normalt =

j

Acct =

(11)

(12) (13)

where λ is the size of the history, PREDICTAttack = 1 if an attack is predicted and 0 otherwise, and PREDICTNormal = 1 − PREDICTAttack . Because A-GHSOM also expects feedback from a network operator, we are able to estimate an accuracy rating for each node. As the amount of feedback received by the system is limited, the

λ(Acctj −1 ) + PREDICTAccurate . λ+1

(14)

We use a combination of consistency and accuracy to calculate a confidence rating as follows. j

Confidencet

=

The fourth enhancement of the A-GHSOM approach is a prediction confidence filtering and forwarding mechanism. It monitors the neuron consistency and accuracy and uses those measures to develop a neuron confidence rating. It then uses the rating to identify an appropriate action for the system to take regarding predictions made by nodes with which the system has low confidence. In a traditional GHSOM, individual neurons and regions of the map are organized to classify connections. All connections matched to identical regions of the map will be classified as the same. However, in A-GHSOM, due to the addition of dynamic input normalization and feedback based threshold adaptation, this is not the case. It is possible for two connections to be matched in an identical region on the map, but be placed into different classification categories depending on their threshold error value and quantization error boundary value. Thus, under varying conditions, agitation of an individual neuron will prompt varying predictions. Neurons that are essentially trained to agitate on normal connections can indicate attack and vice versa. We monitor the frequency that each neuron predicts each connection class and use this condition to calculate a consistency rating. After each connection is processed at time t and a prediction made by neuron j, consistency for the predicting node is calculated according to Eq. (13). Attackt =

system assumes that its predictions are correct in the absence of feedback. PREDICTAccurate is equal to 0 if corrective feedback is received and 1 otherwise. Accuracy for predicting node j is calculated according to Eq. (14).

λ(Confidencejt −1 ) + Acctj + Consistencyjt . λ+1

As the number of predictions by a node increases, three possible consistency scenarios can occur. 1. A node trained ‘‘attack’’ consistently makes ‘‘attack’’ predictions. 2. The node trained ‘‘normal’’ consistently makes ‘‘normal’’ predictions. 3. A node trained ‘‘normal’’ makes a significant number of ‘‘attack’’ predictions. Situations 1 and 2 are not a concern if the accuracy is also high. If an ‘‘attack’’ neuron consistently makes inaccurate ‘‘attack’’ predictions, new neurons will be added to that region trained to predict ‘‘normal’’ and correct the deficiency. Likewise, if a ‘‘normal’’ neuron consistently makes inaccurate ‘‘normal’’ predictions, new neurons will be added to that region trained to predict ‘‘attack’’ and again, the deficiency is corrected. The third situation, however, is a major concern. According to the adaptation rules of A-GHSOM model, all neurons should eventually stabilize to a consistent state. However, it assumes that normal connections and connections of attack types are differentiable based on the available data. If there exist normal connections and attack connections with identical or extremely similar connection vectors, the adaptation process will be stalled and inconsistent predictions will be made. Consider a scenario where several identical but mismatching patterns are processed. The system will first receive feedback that the pattern is ‘‘normal’’ and the adaptation is executed. Then, upon receiving an identical pattern that is ‘‘attack’’, conflicting feedback will undo the original adaptation. Over time, this inconsistent feedback will stall adaptation and prevent the model from settling down to a consistent state. We identify nodes that have low confidence and exclude traffic that they are processing. Indeed, our experiments have found that these nodes are identifying traffic that is not differentiable within the available data. Systems that have access to information such as data content are more suitable to accurately identify those connections. This is where we need turn to content-aware intrusion detection.

1582

D. Ippoliti, X. Zhou / J. Parallel Distrib. Comput. 72 (2012) 1576–1590 Table 5 Accuracy of A-GHSOM with threshold based training.

Table 3 Metric definitions. Predicted class

Actual class

Recorded result

Category

Accuracy

Normal Attack Normal Attack

Attack Attack Normal Normal

Missed attack Correct attack Correct normal False positive

Overall Known Unknown DOS R2L U2R Probe

93.53 99.37 21.27 97.59 35.11 84.29 99.06

Table 4 Accuracy of the baseline A-GHSOM. Category

Accuracy

Overall Known Unknown DOS R2L U2R Probe

92.21 92.81 10.95 98.48 7.66 51.43 78.83

8. Experimental results and analysis using the KDD’99 dataset First, we use the KDD’99 dataset in the experiments for comparison with closely related approaches that also used the KDD’99 dataset for performance evaluation. The KDD’99 dataset contains records describing TCP-connections. Each connection is represented by 41 features. The dataset includes normal connections as well as 23 different types of attacks belonging to four categories: Denial of Service (DOS), Probe attacks, User to Root (U2R), and Remote to Local (R2L). The dataset includes a training set, and two test sets: the ‘‘whole’’ set and the ‘‘corrected’’ set. The ‘‘corrected’’ set includes 17 attack types that are not sampled in the training set and thus are ‘‘unknown’’ to a trained model. For all experiments, we trained A-GHSOM model on the training set and tested it against the ‘‘corrected’’ set. For each connection, a decision is made to process the connection or to forward to the operator based on system confidence. For each connection processed, a classification of ‘‘attack’’ or ‘‘normal’’ is made. Table 3 gives the result matrix. The performance metrics are the accuracy rate and the false positive rate. The accuracy rate is the ratio of the total number of correct attack predictions over the total number of attacks processed. The false positive rate is the ratio of the total number of false positive predictions over the total number of normal connections processed. To establish a baseline for performance comparison, we processed the corrected dataset without applying any of the four enhancements. In this configuration, we train the A-GHSOM map using δE and δD of 0.5 and 100 respectively. Without using thresholds and adaptations, connections are simply classified based on the node they are mapped to. Table 4 gives the overall accuracy rate, individual rates for each attack category, and unknown and known attacks. The overall false positive rate is 0.58%. Next, step by step we study the impact of four enhancements of the A-GHSOM approach on the performance of intrusion detection. 8.1. Impact of the threshold-based training In this experiment, the threshold-based training process is applied, but without adaptations. Furthermore, dynamic input normalization is not used and no operator feedback is assumed. Initial default threshold values for τ1 and τ2 are set to 0.5 and 1.0, but not adapted dynamically. Table 5 shows the overall accuracy rate, individual rates for each attack category, and unknown and known attacks. We can observe that without any adaptation, the overall accuracy is improved slightly while significant improvements are achieved in the categories of R2L, U2R, PROBE, and Unknown. The improvements come at a cost to the false positive rate, which increases to 1.73%.

8.2. Impact of the dynamic input normalization In this experiment, we study the impact of the dynamic input normalization on the performance of A-GHSOM approach. We dynamically change the normalization control parameters α and β from 0.01 to 0.09 for every 15,000 connections. With this process, the parameters are used to control the level of aggressiveness that the dynamic normalization scale is adapted. The more aggressive the adaptation, the larger the absolute variance between two vectors will be. With the absolute variance between vectors increased, connections are more likely to produce quantization error profiles that exceed thresholds. The overall objective in applying this technique is to amplify subtle differences between connections, particularly ‘‘unknown’’ connections, and then use this amplification to increase overall intrusion detection accuracy. Fig. 4 shows that the system with the enhancement overall improves the accuracy significantly for all four attack categories, DOS, Probe attacks, U2R, and R2L. However, the A-GHSOM model is too sensitive to the normalization parameters. Increased accuracy comes at a cost of a higher false positive rate. Fig. 5 depicts the relative operating characteristics (ROC) diagrams that show the relationship between the accuracy and the false positive rate of four attack categories. It shows the false positive rate can be out of control. This is due to the fact that the dynamic input normalization amplifies the variance between vectors. Small differences can register high quantization errors. The threshold-based training process interprets those high errors as anomalous traffic. While many more attacks are identified correctly, more normal connections are flagged as attacks. Next, we demonstrate how the third enhancement of A-GHSOM approach, feedback-based thresholds adaptation, makes a significant impact on maintaining high accuracy but reducing the false positive rate. 8.3. Impact of the feedback-based threshold adaptation Until now, the system has been autonomously adapting. That is, no expert feedback was used. Next, operator feedback is used to further enhance the accuracy of intrusion detection. We use a control parameter of feedback probability to determine the likelihood that the system will be notified that it has made an error. We acknowledge that under normal operating conditions, it is unlikely that an operator will be able to respond to every alarm in a timely manner. Furthermore, we acknowledge that the operator will not be able to identify every missed attack in a timely manner. In the experiments, the feedback probability is equal to the percentage of mistakes that the system makes (missed attack or false positive) and the operator is able to provide feedback on. In the absence of feedback, the system assumes its prediction was correct. Feedback simply consists of notifications that a prediction was incorrect. The limits of operator feedback was discussed in [38]. In that work, corrections were limited to 304. We have found that significant improvement can be achieved with as little as 1% connection feedback, which equals to 282 connections. With the adaptive thresholds, we are able to capitalize on the dynamic input

D. Ippoliti, X. Zhou / J. Parallel Distrib. Comput. 72 (2012) 1576–1590

(a) On DOS accuracy.

(b) On U2R accuracy.

(c) On R2L accuracy.

(d) On PROBE accuracy.

1583

Fig. 4. Impact of the dynamic input normalization.

from anomalous more accurately. Fig. 7 shows the false positive rate with the feedback-based threshold adaptation. Compared to the system that does not use the feedback-based threshold adaptation, the false positive rate is much lower. However, it varies around 8%–10%, which is still higher than desired. 8.4. Impact of the confidence filtering and forwarding

Fig. 5. ROC diagrams of A-GHSOM with the dynamic input normalization.

normalization and bring false positives down to more reasonable levels. When the system receives feedback that it has made an incorrect prediction, the thresholds are adjusted so that future similar vectors are more likely to be handled correctly. This adjustment is made on a per node basis, so that only nodes making inaccurate predictions are adapted. In the experiment, the feedback probability varies from 1% to 10%. Dynamic input normalization parameters α and β are set to 0.04 per every 15,000 connections. Fig. 6(a) and (b) show that with the feedback-based threshold adaptation, the accuracy of intrusion detection is high in every attack category. As more feedback is received, A-GHSOM learns more appropriate thresholds and is able to differentiate normal traffic

Although more feedback reduces the false positive rate, we found that even with 100% connection feedback, the A-GHSOM approach has a false positive rate around 8% while producing excellent accuracy for both known and unknown attacks. We found that approximately 85% of the false positives were predicted by nodes that identified connection patterns not differentiable within the KDD dataset. We now apply the confidence filtering and forwarding mechanism to maintain high accuracy while further reducing false positives. Figs. 8 and 9 depict the cumulative frequency distribution and the frequency distribution, respectively, for the confidence levels of nodes producing correct attack, correct normal, missed attack, and false positive predictions. The cumulative frequency distribution shows that 93% of the predictions are made by nodes with confidence levels greater than 95%. The frequency distribution chart shows that there are two confidence rating concentrations, one at the interval [77%, 81%] and the other one at the interval [95%, 100%]. Additionally, the concentration at the interval [77%, 81%] accounts for approximately 7% of the total connections. We observe that the small percentage of predictions made by nodes with low confidence accounts for 65% of the missed attacks and 84% of the false positive predictions. Furthermore, examination of the low confidence predictions reveals that 89% of the false

1584

D. Ippoliti, X. Zhou / J. Parallel Distrib. Comput. 72 (2012) 1576–1590

1

(a) Accuracy by attack categories.

(b) Accuracy by known and unknown attacks. Fig. 6. Impact of the feedback-based threshold adaptation on detection accuracy.

Fig. 7. Impact of the feedback-based threshold adaptation on the false positive rate.

Fig. 9. Confidence frequency distribution.

be quarantined and forwarded to a second tier intrusion detection system that has access to additional discriminating information such as the data content. Note that it is beyond the scope of this work to develop an additional tier system. In this work, we identify potential connections and consider them neither accurate nor inaccurate. In all cases, over 91% of the processed traffic is classified by the enhanced A-GHSOM approach, which does not require forwarding. 8.5. Impact of the integrated A-GHSOM approach

Fig. 8. Confidence cumulative distribution.

positive predictions were made by neurons that predicted either normal or R2L attacks. This is caused due to the similarity of R2L vector patterns with the ‘‘normal’’ vector patterns. By using the consistency and accuracy to establish confidence rating, we are able to identify connections that must be examined further before a determination can be made. A thorough examination of the test set revealed that of the 16,324 R2L connections, 10,033 had identical matching ‘‘normal’’ connections. 89% of these indiscriminate connections were identified for forwarding by the enhanced A-GHSOM approach. In practice, these connections could

Now we have integrated all four important enhancements in the A-GHSOM approach. Fig. 10(a) and (b) demonstrate the impact of the integrated approach on intrusion detection accuracy. Fig. 11 shows the impact of the confidence forwarding on the false positive rate. The system is adapted every 20,000 connections using different parameters. Input normalization parameters α and β are set to 0.03, and the confidence forwarding probability is 80%. In the experiment, 91.9% of the connections are classified by the integrated approach. 8.1% of the connections are selected for forwarding. The integrated approach achieves an overall accuracy rate of 99.63% while maintaining the false positive rate of 1.8%. All four categories of attacks are predicted with above 87% accuracy. We see the sharpest increase in the accuracy of R2L detection. This is due to the fact that, as the system receives more feedback, it becomes less confident in the nodes processing R2L connections and thus is able to more accurately identify potential vectors for forwarding.

D. Ippoliti, X. Zhou / J. Parallel Distrib. Comput. 72 (2012) 1576–1590

(a) Accuracy by attack categories.

1585

(b) Accuracy by known and unknown attacks.

Fig. 10. The impact of the integrated A-GHSOM approach on intrusion detection accuracy.

A-GHSOM is able to make an efficient tradeoff, and it is adaptive to a changing problem domain of network intrusion. 9. TD-Sim: a dataset with trace data and simulated data

Fig. 11. The impact of the confidence forwarding on the false positive rate. Table 6 Summary of performance comparisons of A-GHSOM with other representative approaches. Approach

Accuracy (%)

False positive (%)

A-GHSOM GHSOM [29], year 2008 Bouzida et al. [3], year 2004 Sarasamma et al. [31], year 2005 Kayacik et al. [20], year 2007 Eskin data mining [14], year 2002 Eskin clustering [14], year 2002 Eskin SVM [14], year 2002 Yu et al. [38], year 2008

99.63 90.87 91.89 93.46 90.4 90.00 93.00 98.00 96.02

1.8 2.69 0.48 3.99 1.38 2.0 10.0 10.0 4.92

The integrated approach is exceptional at identifying previously ‘‘unknown’’ attacks. It is intuitive that with the excessive feedback, the accuracy would increase with ‘‘unknown’’ attacks. However, the approach does not require excessive feedback. With the system being corrected on just 282 predictions, it is able to achieve 94.04% accuracy on ‘‘unknown’’ attacks. As a summary, Table 6 compares the performance of the integrated A-GHSOM approach with eight representative machine learning based approaches. A-GHSOM is consistently able to produce higher accuracy rates while maintaining low false positive rates. Indeed, its false positive rate is lower than all approaches, except two approaches proposed in [3,20]. Note that the false positive rate is just slightly higher (1.32% and 0.42%), but the improvement in the major performance metric accuracy is almost 10% higher. Accuracy and false positive are always tradeoffs.

So far, we have used the KDD’99 dataset to compare the AGHSOM approach with eight representative approaches that also used the dataset for performance evaluation. There have been critiques or arguments made against the use of the KDD dataset [4,5,25,26]. One major issue is that no validation has been done to ensure the KDD dataset is in fact similar to real network traffic [4]. Additionally, the distribution of attacks in the KDD dataset is said to be unrealistic. Due to the way the traffic in the dataset was generated, some of the differences between attack traffic and normal traffic are statistically obvious. The dataset is also rather dated. To avoid drawing research results and conclusions solely based on experiments with the KDD dataset, we have built a dataset (TD-Sim), which consists of a mixture of live trace data and simulated network traffic. TD-Sim was constructed in three steps. We first addressed the issue of similarity between the dataset and live data by using live trace data as the foundation. We started with a publicly available trace dataset and performed preprocessing to correct header/payload discrepancies created during packet anonymization. We then processed the dataset through a commercial intrusion detection system to categorize attack vs. normal traffic. Finally, we inserted simulated traffic to ensure adequate coverage of a variety of network attacks. The packet trace dataset we used as a base for TD-Sim is from the Lawrence Berkeley National Laboratory (LBNL) [22]. The data was collected from Oct 2004 through Jan 2005. The database includes 11 GB of packet header traces in 131 files. Each day trace covers a range of ten minutes to one hour. In this dataset, the payload field is deleted and the IP address field is anonymized. Because the payload field was deleted, calculations in the packet headers were not consistent with the actual packet sizes. We used the commercial tool WireShark [37] to pad the payload fields to match packet header calculations. We chose this method over recalculation in order to preserve the original packet size and data transfer information. As the baseline for identifying malicious activities, we processed the trace dataset through a commercial intrusion detection system (IDS). We selected Snort [7] IDS because it is publicly available and widely used. We used the official ‘‘Sourcefire VRT Certified Rules (registered user release)’’ for version 2.8.5. We processed the dataset with all available rules engaged. The alerts generated

1586

D. Ippoliti, X. Zhou / J. Parallel Distrib. Comput. 72 (2012) 1576–1590

Table 7 Composition of attack traffic in the TD-Sim dataset.

Table 8 Accuracy of A-GHSOM without adaptation.

Attack type

Portion of total attacks (%)

Category

Accuracy

Misc Probe Dos U2R R2L

7.04 49 14.12 14.91 14.93

Overall Trace Known Unknown DOS R2L U2R Probe Misc

64.72 71.14 68.19 53.76 66.27 20.83 86.75 85.13 47.59

by SNORT were separated into the same categories used by the KDD dataset (DOS, PROBE, U2R, and R2L). Additionally, we used a ‘‘Miscellaneous Activity’’ category to capture behaviors that SNORT identified as violating security rules but did not identify a specific attack. Connections in this category are behaviors that are potential warning signs of malicious behaviors. Examples include protocol mismatch issues, destination unreachable alerts, overlapping TCP packet thresholds, etc.. After processing the trace data, we discovered that the SNORT alerts consisted of primarily of Misc, DoS, and Probe alerts, with no R2L alerts and less than 1% of the alerts suspected U2R. In order to ensure adequate coverage of attack types, we generated and inserted simulated traffic into TD-Sim. We did it by setting up a testbed network consisting of clients and servers. Clients included a mix of Windows XP, Windows 2000, and UBUNTU operating systems. Servers included a mix of Solaris, Windows 2000 server, and UBUNTU operating systems. Servers also included dedicated database, web, file, and mail servers as well as windows Domain and DNS services. We generated normal traffic by making HTTP requests, file transfers, normal logons, etc.. We then generated attack traffic using the MetaSploit [27] and NMAP toolkits [28]. We repeatedly performed network scans and brute force compromise attempts. We also repeatedly performed all exploits available for the target operating systems. Levels of aggression available in the toolkits include ‘‘paranoid’’, ‘‘sneaky’’, ‘‘polite’’, ‘‘normal’’, ‘‘aggressive’’, and ‘‘insane’’. The generated connections were randomly inserted into the trace data. Note that the majority of the records in the dataset are from the trace data. Over 70% of the connections used in the TD-Sim dataset are from the trace data obtained from LBNL. TD-Sim dataset also addresses the distribution of attack connections. In the KDD ‘‘corrected’’ dataset, 74% of all records are DoS attacks and only 19.48% of the records are normal connections. Further, R2L attacks make up less than 5% and U2R account for less than 1%. It has become common practice using the KDD dataset to test an intrusion detection system to include an overall success rate. However, when run against the KDD dataset, an intrusion detection system that is tuned to identify DoS and Probe can fail to identify all occurrences of U2R or R2L and still maintain a high overall success rate. To address this issue, in the TD-Sim dataset, attack traffic accounts for less than 30% of all connections and the distribution of attacks is more equitable. Table 7 shows the mix of attack types in the TD-Sim dataset. 10. Experimental results and analysis using the TD-Sim dataset As in the experiments with the KDD dataset, we used a small random sample of the TD-Sim dataset as the training set and the whole dataset as the testing set. We conducted experiments with the TD-Sim dataset in a way similar to that conducted with the KDD dataset. We include experimental results for the Miscellaneous category and a Trace category. While we made every effort to seamlessly integrate our simulated traffic with the trace data, it is impossible to eliminate all artificiality. Trace only results are experimental results that exclude the use of simulated traffic.

10.1. Impact of the threshold-based training In this experiment, the threshold-based training process is applied, but without adaptations. Furthermore, dynamic input normalization is not used and no operator feedback is assumed. Initial default threshold values for τ1 and τ2 are set to 0.5 and 1.0, but not adapted dynamically. Table 8 gives the overall accuracy, individual accuracy for each attack category, and accuracy in detecting unknown and known attacks. We observed that compared to the KDD experiments, initial performance is not as sound. The overall accuracy is 64.72%, while the false alarm rate is high at 6.7%. We attribute this to the fact that in the TD-Sim dataset, many attacks are not statistically obvious. Additionally, the percentage of attacks that are in the R2L and U2R categories is significantly higher than in the KDD dataset. These attacks as well as attacks in the Misc category are extremely challenging to identify via header data alone. However, our analysis shows that the four techniques of the AGHSOM approach are highly effective at improving accuracy and reducing false positives. 10.2. Impact of the dynamic input normalization With the use of TD-Sim dataset, we reexamine the impact of the dynamic input normalization on the performance of A-GHSOM approach. The more aggressive the adaptation, the larger the absolute variance between two vectors will be. Recall that α controls how quickly the scale decays, while β controls how quickly the system responds to observed scale. We observe that while minor adjustments to both α and β are effective in the KDD dataset based experiments, more aggressive adjustments to the β parameter are effective with the TD-Sim dataset. However, when we applied aggressive adjustments to both α and β , experimental results quickly converge to 100% accuracy with 100% false positive rate. With minor adjustments to α and aggressive adjustments to β , we achieve significant improvement to accuracy. We dynamically change the normalization control parameters α and β every 5000 connections. α is adjusted from 0.01 to 0.05 while β is adjusted from 0.1 to 0.5. Fig. 12(a) and (b) show that the enhanced system overall improves accuracy significantly for all attack categories. However, we again see that false positive rates quickly rise. Fig. 13 depicts ROC diagrams where the false positive rate can be out of control and the system needs the feedback-based threshold adaptation. 10.3. Impact of the feedback-based threshold adaptation In this experiment, the feedback probability varies from 1% to 10%. Dynamic input normalization parameters α and β are set to 0.01 and 0.5 respectively and are adapted every 5000 connections. Fig. 14(a) and (b) show that with the feedback-based threshold adaptation, the accuracy of intrusion detection is high in DoS,

D. Ippoliti, X. Zhou / J. Parallel Distrib. Comput. 72 (2012) 1576–1590

(a) Accuracy by attack categories.

1587

(b) Accuracy for known and unknown attacks.

Fig. 12. The impact of the dynamic input normalization on detection accuracy.

Fig. 13. ROC diagrams of A-GHSOM with the dynamic input normalization.

Probe, and U2R categories. The accuracy of intrusion detection in R2L and Misc is in the medium range. However, we still observe that overall, the accuracy of intrusion detection for both known and unknown attacks exceeds 80%. Fig. 15 shows the false positive rate with the feedback-based threshold adaptation. We observe that the false positive rate is relatively lower than that with the KDD dataset experiments. However, it varies from 8% to 5%. We again see the need for confidence filtering and forwarding.

(a) Accuracy by attack categories.

Fig. 15. The impact of the feedback-based threshold adaptation on the false positive rate.

10.4. Impact of the confidence filtering and forwarding We now apply confidence filtering and forwarding to the trace dataset. As in our analysis of the KDD dataset experiments, we have found a large number of records that are not differentiable by examining header information alone. However, confidence filtering is able to identify those records and forward them for additional processing.

(b) Accuracy for known and unknown attacks.

Fig. 14. The impact of the feedback-based threshold adaptation on detection accuracy.

1588

D. Ippoliti, X. Zhou / J. Parallel Distrib. Comput. 72 (2012) 1576–1590

Fig. 16. Confidence cumulative distribution.

Fig. 19. The impact of the confidence forwarding on the false positive rate.

We observe that only 12% of the total traffic is processed by nodes with less than 95% confidence. However, those nodes generate almost 50% of the false positive predictions and account for 72% of the missed attacks. We further observe that with confidence filtering applied, 21.6% of the processed traffic is signaled for forwarding. 88% of that traffic is attack traffic, with DoS and R2L making up 70% of the forwarded activity. 10.5. Impact of the integrated A-GHSOM approach

Fig. 17. Confidence frequency distribution.

Figs. 16 and 17 depict the cumulative frequency distribution and the frequency distribution, respectively, for the confidence levels of nodes producing correct attack, correct normal, missed attack, and false positive predictions. The cumulative frequency distribution shows that 87% of the predictions are made by nodes with confidence levels greater than 95%. The frequency distribution chart shows that there are three confidence rating concentrations, one at the interval [80%, 85%], one at the interval [87%, 92%] and another at the interval [95%, 100%].

(a) Accuracy by attack categories.

Now we have integrated all four important enhancements in the A-GHSOM approach. Fig. 18(a) and (b) show the impact of the integrated A-GHSOM approach on anomaly detection using the TD-Sim dataset. Fig. 19 shows the impact of the confidence forwarding on the false positive rate. The system is adapted every 5000 connections using different parameters. Input normalization parameters α and β are set to 0.01 and 0.5 respectively, and the confidence forwarding probability is 95%. In the experiment, 79.4% of the connections are classified by the integrated approach. 21.6% of the connections are selected for forwarding. The integrated approach achieves an overall accuracy rate of 97.12% while maintaining the false positive rate of 2.6%. DoS, Probe, and U2R categories are all predicted with above 97% accuracy, miscellaneous activity is predicted with 85.8% accuracy, while R2L seems to decline to 47.07% accuracy. This decline can be explained by examining the composition of the forwarded traffic. As noted in the previous section, a large portion of the R2L records are signaled for forwarding. While we leave the details of this

(b) Accuracy by known and unknown attacks.

Fig. 18. The impact of the integrated A-GHSOM approach on intrusion detection accuracy.

D. Ippoliti, X. Zhou / J. Parallel Distrib. Comput. 72 (2012) 1576–1590

processing for future work, we note that the accuracy rates of R2L can be significantly improved by ensuring records selected for further processing are handled effectively. 11. Conclusions and future work In this paper, we propose and develop an adaptive growing hierarchical self organizing map (A-GHSOM) approach to network intrusion detection. Its major significance lies in the online adaptation to the ever-changing problem domain of network intrusion and achieving very high accuracy in identifying network intrusions, particularly those ‘‘unknown’’ attacks. It consists of four important enhancements to the GHSOM model, an enhanced threshold-based training process, a dynamic input normalization process, feedback-based quantization-error threshold adaptation, and a prediction confidence filtering and forwarding mechanism. To avoid drawing research results and conclusions solely based on experiments with the relatively dated KDD’99 dataset, we have also developed a dataset (TD-Sim) that consists of a mixture of live trace data from the Lawrence Berkeley National Laboratory and simulated traffic based on our testbed network, ensuring adequate coverage of a variety of attacks. We have conducted extensive experiments using the KDD’99 dataset and the TD-Sim dataset. Results demonstrated the significant impact of each enhancement on intrusion detection performance. Compared to eight representative intrusion detection techniques that were evaluated with the KDD’99 dataset, the integrated A-GHSOM approach achieves significantly higher accuracy with a low false rate. Those approaches that can achieve accuracy rates close to A-GHSOM do so with much higher false positive rates. A-GHSOM adapts to live traffic and achieves an efficient tradeoff between accuracy and false positive rate. With the use of the TD-Sim dataset, A-GHSOM achieves an overall accuracy rate of 97.12% while maintaining the false positive rate of 2.6%. In this work, parameter values such as α and β are tuned manually via trial and error. We started with very cautious values and then gradually increased aggressiveness until gain was negligible or degraded. We have observed that different values perform very differently in various environments. Also, multiple threshold error values can be explored for different features in the input pattern. In future work, we will explore systematic methods for tuning and optimizing those parameters automatically and on-line. Acknowledgments This research was supported in part by US National Science Foundation CAREER Award CNS-0844983 and research grant CNS1217979. A preliminary version of the paper appeared in [18]. References [1] N. Abe, B. Zadrozny, J. Langford, Outlier detection by active learning, in: Proceedings of the 12th ACM SIGKDD Int’l Conference on Knowledge Discovery and Data Mining, KDD, 2006. [2] C.C. Aggarwal, P.S. YU, Outlier detection with uncertain data, in: Proceedings of the SIAM Int’l Conference on Data Mining, SDM, 2008. [3] Y. Bouzida, F. Cuppens, N. Cuppens-Boulahia, S. Gombault, Efficient intrusion detection using principal component analysis, in: Proceedings of the 3eme Conference Surla Securite et Architectures Reseaux, SAR, 2004. [4] S.-T. Brugger, KDD Cup’99 dataset considered harmful, Position Letter, KD Nuggets newsletter, vol. 07 (18), 2007. [5] S.-T. Brugger, J. Chow, An assessment of the DARPA IDS evaluation dataset using Snort, Technical Report CSE-2007-1, University of California, Davis, Department of Computer Science, Davis, CA, 2007. [6] J. Cannady, J. Mahaffey, The application of artificial intelligence to misuse detection, in: Proceedings of the First Recent Advances in Intrusion Detection Conference, RAID, 1998. [7] B. Caswell, M. Roesch, Snort: the open source network intrusion detection system, 2004. http://www.snort.org/.

1589

[8] V. Chandola, A. Banerjee, V. Kumar, Anomaly detection: a survey, ACM Computing Surveys 41 (3) (2009). [9] W.-H. Chen, S.-H. Hsu, H.-P. Shen, Application of SVM and ANN for intrusion detection, Computer and Operations Research 32 (2005) 2617–2634. [10] K. Das, J. Schneider, Detecting anomalous records in categorical datasets, in: Proceedings of the 13th ACM SIGKDD Int’l Conference on Knowledge Discovery and Data Mining, KDD, 2007. [11] D. Denning, An intrusion-detection model, IEEE Transactions on Softwre Engineering 13 (2) (1987). [12] O. Depren, M. Topallar, E. Anarim, M.K. CilizAn, An intelligent intrusion detection system for anomaly and misuse detection in computer networks, Expert Systems with Applications 29 (2005) 713–722. [13] M. Dittenbach, D. Merkl, A. Rauber, The growing hierarchical self organizing map, in: Proceedings of the Int’l Joint Conference on Neural Networks, 2000. [14] E. Eskin, A. Arnold, M. Prerau, L. Portnoy, S. Stolfo, A geometric framework for unsupervised anomaly detection: detecting intrusions in unlabeled data, Applications of Data Mining in Cumputer Security (2002) (Chapter 4). [15] V. Golovko, P. Kachurka, L. Vaitsekhovich, Neural network ensembles for intrusion detection, in: Proceedings of IEEE Int’l Workshop on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, 2007. [16] K.K. Gupta, B. Nath, R. Kotagiri, Layered approach using conditional random fields for intrusion detection, IEEE Transactions on Dependable and Secure Computing 7 (1) (2010) 35–49. [17] S. Hettich, S. Bay, The UCI KDD archive, University of California, Irvine, 1999. http://kdd.ics.uci.edu. [18] D. Ippoliti, X. Zhou, An adaptive growing hierarchical self organizing map for network intrusion detection, in: Proceedings of the 19th IEEE Int’l Conference on Computer Communications and Networks, ICCCN, pp. 1–7, 2010. [19] D. Jiang, Y. Yang, M. Xia, Research on intrusion detection based on an improved SOM neural network, in: Proceedings of the Fifth Int’l Conference on Information Assurance and Security, 2009. [20] H.G. Kayacik, Z.-H. Nur, M.I. Heywood, A hierarchical SOM-based intrusion detection system, Engineering Applications of Artificial Intelligence 20 (2007). [21] H.G. Kayacik, Z.-H. Nur, M.I. Heywood, On the capability of an SOM based intrusion detection system, in: Proceedings of the Int’l Joint Conference on Neural Networks, vol. 3, 2003. [22] LBNL/ICSI enterprise tracing project. http://www.icir.org/enterprise-tracing/. [23] Y. Li, B. Fang, L. Guo, Y. Chen, Network anomaly detection based on TCM-KNN algorithm, in: Proceedings of ACM Symposium on Information, Computer and Communications Security, 2007. [24] R.P. Lippmann, D.J. Fried, I. Graf, J.W. Haines, K. Kendall, D. McClung, D. Weber, S. Webster, D. Wyschogrod, R.K. Cunningham, M. Zissman, Evaluating intrusion detection systems: the 1998 DARPA off-line intrusion detection evaluation, in: Proceedings of the DARPA Information Survivability Conference and Exposition, 2000. [25] M.V. Mahoney, P.K. Chan, An analysis of the 1999 DARPA/Lincoln laboratory evaluation data for network anomaly detection, in: Proceedings of the 6th International Symposium on Recent Advances in Intrusion Detection, RAID, 2003. [26] J. McHugh, Testing intrusion detection systems: a critique of the 1998 and 1999 darpa intrusion detection system evaluations as performed by Lincoln laboratory, ACM Transactions on Information and System Security 3 (4) (2000) 262–294. [27] The MetaSploit project. http://www.metasploit.com/. [28] NMAP security scanner. http://nmap.org/. [29] E. Palomo, E. Dominguez, R. Luque, J. Munoz, A new GHSOM model applied to network security, in: Proceedings of the 18th Int’l Conference on Artificial Neural Networks, 2008. [30] B. Rhodes, J. Mahaffey, J. Cannady, Multiple self-organizing maps for intrusion detection, in: Proceedings of the 23rd National Information Systems Security Conference, 2000. [31] S. Sarasamma, Q. Zhu, J. Huff, Hierarchical Kohonenen net for anomaly detection in network security, IEEE Transaction on Systems, Man, and Cybernetics—Part B: Cybernetics 35 (2) (2005) 302–312. [32] M.G. Schultz, E. Eskin, E. Zadok, S.J. Stolfo, Data mining methods for detection of new malicious executables, in: Proceedings of the IEEE Symposium on Security and Privacy, 2001. [33] S.L. Scott, A Bayesian paradigm for designing intrusion detection systems, Computational Statistics and Data Analysis 45 (2004) 69–83. [34] G. Shu, D. Lee, Testing security properties of protocol implementations—a machine learning based approach, in: Proceedings of IEEE Int’l Conference on Distributed Computing Systems, ICDCS, 2007. [35] M. Thottan, C. Ji, Anomaly detection in IP networks, IEEE Transactions on Signal Processing 51 (8) (2003) 2191–2204. [36] C. Tsai, Y. Hsu, C. Lin, W. Lin, Intrusion detection by machine learning: a review, Expert Systems with Applications 36 (10) (2009) 11994–12000. [37] WireShark network protocol analyzer. http://www.wireshark.org/. [38] Z. Yu, J. Tsai, T. Weigert, An adaptive automatically tuning intrusion detection system, ACM Transactions on Autonomous and Adaptive Systems 3 (3) (2008).

1590

D. Ippoliti, X. Zhou / J. Parallel Distrib. Comput. 72 (2012) 1576–1590 Dennis Ippoliti is a Ph.D. candidate in the Department of Computer Science at the University of Colorado, Colorado Springs. His research interests are mainly in cyber security in computer networks and systems. He received his M.A. degree from Bellevue University, in 2001, and his M.S. degree in computer science from the University of Colorado, Colorado Springs in 2006.

Xiaobo Zhou is an associate professor and the chairman of the Department of Computer Science, University of Colorado, Colorado Springs. He is the director and cofounder of the Ph.D. in Engineering with a focus in security degree program. He obtained B.S., M.S., and Ph.D. degrees in computer science from Nanjing University, in 1994, 1997, and 2000, respectively. He was a visiting scientist and a postdoctorate research associate at the University of Paderborn, Germany. He was a visiting assistant professor at Wayne State University, Detroit. His research is mainly in computer network systems, more specifically, autonomic computing in virtualized data centers, scalable Internet services and architectures, quality-of-service and cyber security on the Internet. His research was supported in part by US National Science Foundation, Army, and Air Force Research Lab. Dr. Zhou is a general co-chair of the ICCCN 2012, a program co-chair of the IEEE ICCCN 2011, a program vice chair of the IEEE GLOBECOM 2010, ICCCN 2009, HPCC 2008, and IEEE/IFIP EUC 2008, and the workshop general chair of the IEEE ICCCN 2007 and IFIP EUC 2006. He is an associate editor of Computer Communications and Journal of Network and Computer Applications, Elsevier. He was a guest editor of the Journal of Parallel and Distributed Computing and the ACM Transactions on Autonomous and Adaptive Systems. He was a recipient of NSF CAREER Award 2009.