Evolving boundary detector for anomaly detection

Evolving boundary detector for anomaly detection

Expert Systems with Applications 38 (2011) 2412–2420 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: ww...

925KB Sizes 1 Downloads 42 Views

Expert Systems with Applications 38 (2011) 2412–2420

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Evolving boundary detector for anomaly detection Wang Dawei ⇑, Zhang Fengbin, Xi Liang Department of Computer Science and Technology, Harbin University of Science and Technology, P.O. Box 258, 52 Xuefu Road Nangang District, Harbin City 150080, Heilongjiang Province, PR China

a r t i c l e

i n f o

Keywords: Artificial immune systems Anomaly detection Real-valued negative selection Evolutionary search Hole Deceiving anomaly

a b s t r a c t In real-valued negative selection algorithm, the variability of self sample would result in the holes on the boundary between the self and non-self region and the deceiving anomalies hidden in the self region. This paper analyzes the reason for the difficulty in handling these problems by traditional evolved detectors, and then proposes a method of evolving boundary detectors to solve them. This method uses an improved detector generation algorithm based on evolutionary search to generate boundary detectors. The boundary detectors constructed by an aggressive interpretation are allowed to cover a part of self region. The aggressiveness controlled by boundary threshold can convert some volume of self sample into the fitness of boundary detector. This makes them enable to eliminate the holes on the boundary and have an opportunity to detect the deceiving anomalies hidden in the self region. Experiments are carried out using both 2-dimensional dataset and real world dataset. The former was designed to demonstrate intuitively that boundary detectors can cover the holes on the boundary, while the latter was to show that boundary detectors can detect the deceiving anomalies. Ó 2010 Elsevier Ltd. All rights reserved.

1. Introduction The anomaly detection problem can be stated as a two-class problem: given an element of the space, classify it as normal or abnormal (Patcha & Park, 2007). A very common approach of anomaly detection is to specify a range of variability for each parameter of the system, and if the parameter is out of a range, it is considered to be an abnormality (Lane & Brodley, 1999). There exist many approaches for anomaly detection which include statistical (Denning, 1987), machine learning (Chan & Lippmann, 2006), data mining (Costantina et al., 2007). Unfortunately, these proposed techniques might have difficulty in many anomaly detection applications, because abnormal samples are not available at the training stage. The immunological inspired techniques have been successfully to perform anomaly detection (Forrest, Perelson, Allen, & Cherukuri, 1994; Hofmeyr & Forrest, 2000; Simon & He, 2008). The task of anomaly detection may be considered as analogous to the immunity of natural systems, while both of them aim to detect the abnormal behaviors of system that violate the established policy (Boukerche, Machado, & Juca, 2007; Dasgupta & Gonzalez, 2002). Artificial immune systems (AIS) is a relatively new field that tries to exploit the mechanisms present in the biological immune system (BIS) in order to solve computational problems (Gonzalez, ⇑ Corresponding author. Tel.: +86 159 0461 2835. E-mail addresses: [email protected] (D. Wang), [email protected] (F. Zhang), [email protected] (L. Xi). 0957-4174/$ - see front matter Ó 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2010.08.030

Dasgupta, & Gomez, 2003). The vast majority of developments within AIS focused on three main immunological theories: clonal selection, immune network and negative selection. They can roughly be classified into two major categories: techniques inspired by the self/non-self recognition mechanism and those inspired by the immune network theory (Gonzalez et al., 2003). The negative selection algorithm (NSA) was proposed by Forrest and her group (Forrest et al., 1994). This algorithm is inspired by the mechanism of T-cell maturation and self tolerance in the immune system, and believed to have distinct process from alternative methods and be able to provide unique results with better quality (Garrett, 2005). Different variations of NSA have been used to solve problems of anomaly detection, fault detection, to detect novelties in time series, and even for function optimization (Zhou & Dasgupta, 2007). The two major data representations of NSA are (low-level) binary representation and (high-level) real-valued representation. Most works in NSA used the problem in binary representation (Esponda, Forrest, & Helman, 2004). Binary representation provides a finite problem space that is easier to analyze, and straightforward to use for categorized data. However, NSA in binary representation can hardly process many applications that are natural to be described in real-valued space (Zhou & Dasgupta, 2007), and generates a higher false alarm rate when applied to anomaly detection for some data sets (Dasgupta, Yu, & Majumdar, 2003). Gonzalez, Dasgupta, and Kozma (2002) introduced a real-valued representation, called real-valued negative selection (RNS) algorithm to alleviate the scaling issues of binary representation. Real-valued

2413

D. Wang et al. / Expert Systems with Applications 38 (2011) 2412–2420

representation provides some advantages such as increased expressiveness, the possibility of extracting high-level knowledge from the generated detectors, and, in some case, improved scalability (Gonzalez & Dasgupta, 2003). Zhou and Dasgupta (2004) proposed a RNS with variable-sized detectors (V-detector). Vdetector uses variable-sized detectors and terminates training stage when enough coverage is achieved. Detector generation with real-valued representation can employ either random search (Gonzalez et al., 2002; Gonzalez & Dasgupta, 2003; Zhou & Dasgupta, 2004) or evolutionary search (Dasgupta & Gonzalez, 2002). Random search is known as classical generation-and-elimination strategy. Only the qualified detectors that do not match the self are selected and used to detect abnormal behavior of the new incoming data. Unfortunately, these randomly generated detectors cannot be guaranteed to cover the non-self region in the most efficient way. Evolutionary search used a genetic algorithm (GA) to generate detectors. The fitness function is based on the number of elements in the training set that belongs to the subspace represented by the detector and the volume of the subspace represented by the detector. A niching algorithm is applied to get the different detectors. This approach employs hypercube as the shape of detector and enables to cover the non-self region with fewer detectors. In this paper, we show the issues in real-valued negative selection algorithm (RNS) caused by the variability of self sample, which include the holes on the boundary and the deceiving anomaly. However they can hardly be solved by the traditional detectors generated by evolutionary search. Then we propose a method which evolves aggressive boundary detectors to cover the non-self region. This approach improves the detector generation method shown in Dasgupta and Gonzalez (2002), while it employs hypersphere as the shape of detector and evaluates the fitness of detector via the actual covering volume. The aggressive boundary detectors can convert some volume of the self region on the boundary into the fitness of themselves. The increase of the fitness of the detectors on the boundary can decrease the holes and have a chance to detect the anomaly hidden in the self region. The remaining sections of the paper are structured as follows: Section 2 shows improved detector generation algorithm based on evolutionary search. This is then followed in Section 3 with the problem with the issues in real-valued negative selection algorithm caused by the variability of self sample. In Section 4 we introduce the aggressive boundary detector. We carry out the experiment on synthetic data and Fisher’s Iris data in Section 5. Finally, some concluding remarks are given in Section 6. 2. Detector generation algorithm based on evolutionary search Detector generation based on evolutionary search shown in Dasgupta and Gonzalez (2002) uses hypercube as the shape of detector. In this work, we employ hypersphere which is the same shape of self sample as the shape of detector. The hypersphere detectors are generated using evolutionary search driven by two main goals (Gonzalez & Dasgupta, 2003): 1. Move the detector away from the self samples. 2. Maximize the covering of non-self region and minimize the overlap among the detectors. Fig. 1 illustrates the flow chat of detector generation based on evolutionary search, where coverage current volume of non-self region covered by detectors desiredCoverage desired volume of non-self region covered by detectors minFitness minimum fitness allowed for a detector to be include in the mature detector set

Fig. 1. Detector generation using evolutionary search.

maxAttempts maximum number of attempts to try to evolve a detector with a fitness greater than minFitness The initial population of algorithm is a set of randomly generated candidate detectors Dcandidate = {d1, d2, . . . , dm}, where djdenotes a hypersphere detector whose center is a n-dimensional j j j point ðd1 ; d2 ; . . . ; dn Þ and radius is rjd . After crossover and mutation operations, the candidate detector which is the best evolved one in the population will be added into the mature detector set, if its fitness is greater than minFitness; otherwise, attempts which denotes the number of failing to evolve a detector with fitness greater than minFitness will increase. The convergence condition of algorithm is that either coverage is greater than desiredCoverage or attempts is greater than maxAttempts. The fitness calculation is to evaluate the quality of a candidate in a population. Based on their fitness, the candidate may be selected for crossover and mutation operations or even becoming a mature detector. In this work, the fitness of a detector is evaluated by the actual volume of the non-self region covered by the detector. The fitness function is described by the following equation:

 fitnessðdÞ ¼

v olumeðdÞ  ov erlapðdÞ

if cov erageðd; SÞ ¼ 0

1

if cov erageðd; SÞ > 0

ð1Þ

where d denotes the detector whose fitness need to be evaluated and S is self sample set. This fitness function is guided by the two main goals mentioned in the previous section. The fitness of d is equal to 1, if d covers self region. Such a determination is in accord with the principle of NSA. The fitness can be divided into two parts, if d does not cover any self samples. A detector is rewarded for covering the more non-self region, while it will be penalized if it overlaps with other mature detectors. Then we estimate volume(d) which denotes the volume of d inside the shape space and overlap(d) which is the volume overlapped with other mature detectors to evaluate the fitness of d using Monte Carlo method. 3. Issues in evolving detectors AIS has been successfully to perform many anomaly detection application in which other techniques such as statistical and

2414

D. Wang et al. / Expert Systems with Applications 38 (2011) 2412–2420

machine learning might have difficulty, because it need no abnormal samples at the training stage. However, the self sample is never complete, so we add the variation parameter to provide an approximation of the self set. In this work, the radius of hypersphere represents the level of variability. Fig. 2 illustrates the variability, in which there are five self samples in the whole space. In Fig. 2(a) the radius of self samples is equal to zero, the self samples with small radius are described by Fig. 2(b), and in Fig. 2(c) the radius of self samples is so large that one can touch others. This variability results in the problems with holes on the boundary and deceiving anomalies hidden in the self region. In this section, we will show these problems and explain the reason for the difficulty in handling these problems using the traditional detectors generated by evolutionary search. 3.1. Holes on the boundary Holes are some elements not seen during the training phase (Stibor, Timmis, & Eckert, 2006). Fig. 3(a) shows a pentagram shape of self region in 2-D space. Fig. 3(b) illustrates detector generation based on evolutionary search. The dark grey area represents the self region, while the light grey circles are the mature detectors. The holes are illustrated in black. Clearly, there are many holes on the boundary between the self and non-self region. The major cause that results in the holes on the boundary is the variability of self samples. We quantize this variability using the radius of self sample. If the self radius is too small, the space between self samples could not be represented. That is to say that we need more self samples to train the detectors correctly. On the other hand, if the self radius is too large, the non-self region covered by the self samples would by too large to accept. In a word,

we need an appropriate variability to build the self region so that the detector would be trained correctly. However, this variability makes the boundary irregular. The most important principle of NSA that the detectors can not cover the self space makes the algorithm hard to generate the proper detectors to cover the holes on the boundary, because the irregular boundary makes the candidates easier to break that rule. Another reason is the setting of minFitness. This is a very important parameter of detector generation algorithm based on evolutionary search. If this parameter is too large, the algorithm could hardly generate the qualified detector whose fitness is larger than minFitness when coverage is relatively high. When it is too small or even equal to zero, the algorithm could be hard to converge, because the algorithm will find a lot of holes whose volume is larger than minFitness. Summarizing one can say that minFitness can be used to balance the number of the holes and the speed of the convergence. The global optimality of GA makes the detector generation algorithm enable to evolve the best detector every run of GA. However, if the volume of a non-self region that has not been covered by the mature detectors on the irregular boundary is smaller than minFitness, the non-self region may have no chance to be covered. 3.2. Anomaly hidden in the self region Another problem brought by the variability of self sample is the deceiving anomalies hidden in the self region. Some abnormal behaviors are quite similar to normal ones in many practical applications. Take KDD CUP 1999 as an example. This is a network intrusion data set, which includes four types of attack such as DoS, R2L, U2R and Probing. The detection results of R2L and U2R

Fig. 2. Variability of self sample. (a) Zero radius, (b) small radius, and (c) large radius.

D. Wang et al. / Expert Systems with Applications 38 (2011) 2412–2420

2415

Fig. 3. Detector generation using evolutionary search in 2-dimensional space. (a) Self region of pentagram shape, and (b) detectors generated using evolutionary search.

are not as good as DoS and Probing when using RNS. After analyzing the characteristic of the ‘‘bad” connections that belong to R2L and U2R, we found that many fields of these connections are quite similar to the corresponding fields of the ‘‘good” connections. In other words, these ‘‘bad” connections are near to the ‘‘good” ones in shape space. We call these connections as deceiving anomalies. In this section, we will show the cause of the deceiving anomalies. Fig. 4 illustrates the deceiving anomalies in 2-dimensional space. In Fig. 4 (a) the self samples in shape space have no variability. The light grey dots represent the self samples, while the black ones are anomalies. Clearly, the anomalies are quite near to the self samples. Moreover, the self region is divided into two parts by anomalies. In Fig. 4(b) the cycles represent the self samples with variability. The black dots still represent the anomalies. In fact Fig. 4 shows two types of deceiving anomaly. One is the anomalies covered by the self samples with variability falsely. In traditional real-valued negative selection algorithm, the self and non-self elements, which are the only two types in space, cannot overlap with each other. However, the variability makes the self sample cover some false region so that the anomalies might be covered by self samples falsely. This situation usually happens on the boundary between the self and non-self region. The other type is the holes in the self region that cannot be covered by the detectors because of the setting of minFitness. From the Fig. 4 we can see that there are two self regions when the self radius is equal to zero. However, these two regions join together when we increase the radius. Finally, some non-self regions whose volume is smaller than minFitness are generated in the center of the self region. The anomalies in these non-self regions can never be detected by detectors, even though they are not covered by self samples. It seems that these deceiving anomalies disappear in the self region.

In a summary, the major reason that results in the deceiving anomaly is still the variability of self sample. In next section, we will use aggressive boundary detectors to deal with this variability. 4. Boundary detector The method proposed in this paper evolves boundary detectors to counteract the variability of the self sample. The boundary detectors are constructed by an aggressive interpretation. Fig. 5 illustrates the difference between the traditional and aggressive detector. In Fig. 5(a), we use a traditional interpretation to construct the detectors that are not allowed to cover any self samples. Then the boundary detectors described by Fig. 5(b) are constructed by an aggressive interpretation which allows them to ‘‘touch” the self samples. The aggressiveness refers to the permission for the detectors to cover part of the self sample. However, the overmuch aggressiveness will make the boundary detectors ‘‘invade” too much volume of the self region. Therefore we set boundary threshold rb to control this aggressiveness. Now assume that the self radius is rs. The boundary detectors will not have any aggressiveness if rb is equal to zero. In this case the boundary detectors are the same as the detectors constructed by traditional interpretation. The boundary detectors might ‘‘touch” the center of the self sample if rb is equal tors. In short, the larger rb is, the more aggressive the boundary detectors are. The only difference between the boundary detectors and those traditional ones lies in the fitness evaluation shown in Fig. 6. x is a candidate detector in the population; D is the minimum distance between the candidate and all self samples; rs is the self radius; rb is the boundary threshold. The fitness of the traditional detector is assign to 1, if D is smaller thanrs. However the aggressive

Fig. 4. Deceiving anomaly. (a) Self samples with no variability, and (b) self samples with variability.

2416

D. Wang et al. / Expert Systems with Applications 38 (2011) 2412–2420

The algorithm terminates if the desired coverage has been reached or the number of attempt has been larger than maxAttempts. Then we compute Vs-coverage (the volume of self region), Vd-coverage (the volume of non-self region covered by detectors), Vd-volume (the total volume of detectors), Vb-coverage (the volume of self region ‘‘invaded” by boundary detectors). Theses values are used to compute three measures of effectiveness: Coverage Rate, Overlap Rate and Aggressiveness Rate defined as follows:

V d-coverage V space  V s-coverage V d-coverage Overlap Rate ¼ 1  V d-volume V b-coverage Aggressiveness Rate ¼ V s-coverage

Coverage Rate ¼ Fig. 5. Detectors constructed by different interpretations. (a) Traditional, and (b) aggressive.

Fig. 6. Key difference in the boundary detector.

boundary detector is considered to cover self sample, when D is smaller than the difference of rs and rb(rb < rs). It seems trivial that the only difference is the boundary threshold. However the effect is more subtle than it appears. The aggressiveness controlled by the boundary threshold can convert part volume of the self samples into the fitness of the detectors on the boundary. Therefore there is enough volume near the irregular boundary for algorithm to generate the detectors to cover the holes. In addition, this aggressiveness also gives the boundary detectors a chance to detect deceiving anomalies hidden in the self region. In the case of the holes in the self region, the boundary detectors would not consider them as holes, because they take a part of the self region around the holes as the non-self region. Furthermore, the boundary detectors also solve the ‘‘Boundary Dilemma” (Zhou, 2005) in another way. The boundary detectors can only cover the self samples on the boundary, so the boundary between the self and non-self region is detected though not explicitly represented. 5. Experiments and analyses Experiments were performed to verify the boundary detectors can solve the problem presented in the previous section. We carried out the Experiments on both 2-dimensional dataset and real world dataset. The experiments made use of the reproduction with tournament selection, quantum crossover and Gaussian mutation operator. A Monte Carlo method is employed to perform fitness evaluation of detector and compute coverage after each run. 5.1. 2-Dimensional data These experiments were designed to demonstrate intuitively that the boundary detectors can cover the holes on the boundary. The entire searching space is a 2-dimensional square [0, 1]2, and its volume Vspace is equal to 1.0. Now assume that the training self points are distributed randomly over the self region of triangle shape. In order to show the effect of the boundary detector, we evolve the boundary detectors with different boundary threshold.

ð2Þ ð3Þ ð4Þ

The following parameters were used in the experiments: crossover rate pcross = 0.8; mutation rate pmutation = 0.1; minimum fitness minFitness = 0.005; maximum generation per run maxGeneration = 100; the population size popSize = 100; the maximum number of attempt maxAttempts = 20; self threshold rs = 0.08; boundary threshold rb = 0  0.04; the desired coverage rate desiredCoverage = 0.99. Fig. 7 illustrates how well the boundary detectors can deal with the holes on the boundary. Fig. 7(a) is the self region of triangle shape whose volume is equal to 0.2605. Fig. 7(a)–(c) shows the coverage achieved by boundary detectors using different boundary threshold, while the three measures of effectiveness is shown in Table 1. In the case where boundary detectors do not have any aggressiveness (rb = 0), there are so many holes on the boundary, which could never be covered by detectors because of minFitness that the coverage rate does not reach the desiredCoverage. Fig. 7(c) shows the coverage achieved by boundary detectors whose boundary threshold is 0.02. Also, the coverage rate does not reach the desiredCoverage. It shows that there are still some holes on the boundary. The coverage rate can reach the desiredCoverage when we increase boundary threshold to 0.04 shown in Fig. 7(d). Clearly, there are few holes left on the boundary. Unfortunately the aggressiveness rate is too high. In addition, as a result of boundary threshold, the boundary detectors can capture the boundary of the self region illustrated in Fig. 7(c) and (d). As mentioned before, the aggressiveness controlled by boundary threshold can convert part volume of the self sample into the fitness of the detector. Fig. 8 illustrates how boundary detector works. The dashed area that belongs to the self sample is considered as non-self region which can be covered by the detector. In this case the fitness of detectors on the boundary can be increased. Therefore the boundary detectors are able to cover the holes on the boundary effectively. Summarizing one can say that boundary detectors with aggressiveness can eliminate the holes on the boundary. However the overmuch aggressiveness will result in a high aggressiveness rate.

5.2. Fisher’s Iris data These experiments were performed to prove that boundary detectors can detect the deceiving anomalies hidden in the self region. We carried out the experiments on Fisher’s Iris data. These data have been widely used for examples in discriminant analysis and cluster analysis. The sepal length, sepal width, petal length and petal width are measured in millimeters on fifty iris specimens from each of three species, Iris setosa, Iris versicolor and Iris virginica. The entire searching space is a 4-dimensional hypercube [0, 1]4. We employed one of the three species as the training data, while

D. Wang et al. / Expert Systems with Applications 38 (2011) 2412–2420

2417

Fig. 7. Using boundary detectors to cover non-self space. (a) Self region of triangle shape, (b) boundary threshold is 0, (c) boundary threshold is 0.02, and (d) boundary threshold is 0.04. Table 1 Results of using boundary detectors in 2-dimensional space. Boundary threshold

Number of detectors

Coverage rate (%)

Overlap rate (%)

Aggressiveness rate (%)

0 0.02 0.04

20 14 11

96.83 98.53 99.23

27.68 30.38 31.66

0 6.34 18.36

ness = 0.003; maximum generation per run maxGeneration = 100; the population size popSize = 200; the maximum number of attempt maxAttempts = 10. The detection rate and false alarm rate described as follow are used to evaluate the performance of boundary detectors

TP TP þ FN FP False alarm rate ¼ TN þ FP

Detection rate ¼

Fig. 8. Detector with aggressiveness.

another two species are used to measure the performance of the detectors. In order to highlight the importance of the variability of self sample, we use 50% normal samples to train the detectors. The different self radius and boundary threshold are used to evolve the boundary detectors. The rest parameters are listed as follows: self threshold rs = 0–0.08; boundary threshold rb = 0–0.04; the desired coverage rate desiredCoverage = 0.99; crossover rate pcross = 0.8; mutation rate pmutation = 0.1; minimum fitness minFit-

ð5Þ ð6Þ

where TP is the number anomalous elements identified as anomalous; TN is the number of normal elements identified as normal; FP is the number of normal elements identified as anomalous; FN is the number of anomalous elements identified as normal. All the results are the average of 50 runs with same control parameters shown in Table 2. From Table 2 we can see that whatever the training data is, the detection rate of another two species is good when self radius is equal to zero, however, the false alarm rate is relatively high. That indicates the importance of the variability of self sample. We only have part of information to build the self region, because the self samples are never complete. When the self radius is too small or even zero, the space between self samples could not be represented. Therefore some self region will be covered by detectors falsely. On the other hand, as self radius growth, the false alarm rate is brought down. However the false non-self region covered by self samples is so large as to affect the detection rate. The reason for low detection rate as self radius growth is the deceiving anomalies. The detection rate of another two species is quite good, when we employ Setosa as the training data. When

2418

D. Wang et al. / Expert Systems with Applications 38 (2011) 2412–2420

Table 2 Results of using boundary detectors on Fisher’s Iris Data. Training data

Self radius

Boundary threshold

Detection rate Setosa

Versicolor

Virginica

Setosa

0 0.04

0 0 0.01 0.02 0 0.01 0.02 0.04



100 100 100 100 99.0 99.44 100 100

100 99.44 100 100 97.76 98.48 99.16 99.48

8.88 1.16 1.48 2.56 0.56 0.76 0.88 1.00

9.86 10.5 10.00 9.68 9.92 9.78 9.12 8.84



92.76 82.56 88.44 92.40 70.60 77.76 87.84 91.6

19.48 7.80 10.08 13.56 1.44 3.92 5.40 8.44

20.68 29.18 25.76 24.32 39.74 36.28 35.56 28.20

99.64 97.24 98.32 98.76 90.16 92.48 95.00 97.72



28.56 13.88 15.16 17.44 6.56 7.96 8.64 16.12

11.22 14.20 13.48 12.96 20.72 19.22 18.88 16.24

0.08

Versicolor

0 0.04

0.08

Virginica

0 0.04

0.08

0 0 0.01 0.02 0 0.01 0.02 0.04

99.16 98.60 98.88 99.08 97.80 98.16 98.32 98.84

0 0 0.01 0.02 0 0.01 0.02 0.04

100 99.32 99.56 100 98.28 98.56 98.84 99.28

False alarm rate

Number of detector

employing Virginica, the detection rate of Setosa is still good, while the detection rate of Versicolor decrease appreciably. However the detection rate of Virginica is not satisfactory, as Versicolor is the training data. The cause of this result is that the two species, Virginica and Versicolor, are so similar that the elements normalized from these two species are close to each other. A few anomalies which belong to Virginica hide in the self region built by Versicolor, because of the variability of self sample. Therefore a few Virginica anomalies could not be detected by the traditional detectors (rb = 0), when employing Versicolor as the training data. The aggressive boundary detectors are able to detect the deceiving anomalies. According to the results shown in Table 2, when employing Versicolor as the training data, the detection rate of Virginica increase gradually as boundary threshold growth. It demonstrates that boundary detectors can ‘‘invade” the self region and detect the deceiving anomalies. Figs. 9 and 10 illustrate the variation trend of boundary threshold’s affect on the result for boundary threshold from 0 to 0.08

when self radius is 0.08. The detection rate will be increased as boundary threshold growth; however, the overmuch aggressiveness will result in a fairly high false alarm rate. In a word, boundary threshold can be use to balance between high detection rate and low false alarm rate.

Fig. 9. Detection rate trend on Fisher’s Iris dataset.

Fig. 10. False alarm rate trend on Fisher’s Iris dataset.

5.3. KDD CUP 1999 As mentioned before, KDD CUP 1999 is a network intrusion dataset, which contains of connection-based intrusions and normal network traffic. The data set consists of connection-based network traffic data, where each record corresponds to one network connection. A network connection is a sequence of Internet packets sent during a period of time between two IP addressees. A complete record is described as a network connection vector which contains 38 continuous and 3 symbolic fields and end-label. The experiments are performed on 10% data set. The reduced dataset contains 494,021 records in which there are 396,473 anomalous

2419

D. Wang et al. / Expert Systems with Applications 38 (2011) 2412–2420

and 97,278 normal connection vectors. Furthermore, each discriminative symbolic string is mapped onto a natural number, i.e. tcp ? 0, udp ? 1, icmp ? 2, and so on.

In these experiments, we also use 50% normal samples to train the detectors. The different self radius and boundary threshold are used to evolve the boundary detectors. The rest parameters are

Table 3 Results of using boundary detectors on KDD CUP Data. Self radius

Boundary threshold

Detection rate

False alarm rate

DoS

Probing

U2R

R2L

0.1

0 0.025 0.05

27.5 27.61 28.32

52.71 55.39 71.97

23.08 26.92 55.77

24.42 27.09 29.13

4.63 5.31 5.97

0.05

0 0.0125 0.025

28.32 91.41 94.5

78.67 83.52 93.64

60.75 70.36 78.85

30.12 43.07 50.69

6.14 6.54 7.13

Fig. 11. Detection rate trend on KDD CUP dataset.

Fig. 12. False alarm rate trend on KDD CUP dataset.

2420

D. Wang et al. / Expert Systems with Applications 38 (2011) 2412–2420

listed as follows: self threshold rs={0.05, 0.1}; boundary thresholdrb = {0.025, 0.05}; the desired coverage rate desiredCoverage = 0.99; crossover rate pcross = 0.8; mutation rate pmutation = 0.1; minimum fitness minFitness = 0.03; maximum generation per run maxGeneration = 100; the population size popSize = 2000; the maximum number of attempt maxAttempts = 10. The detection rate and false alarm rate described as follow are used to evaluate the performance of boundary detectors. The detection results are presented in Table 3. From Table 3 it can be said that boundary detector can obtain much better detection performance than traditional evolved detector, although the false alarm rate is relatively high. In the case where self radius is equal to 0.1, the detection result of DoS attack are quite low, and has little improvement even if we increase boundary threshold. It is because the Smurf attacks which belong to DoS are covered by self samples that they cannot be easily detected. When we decrease self radius to 0.05, detection rate of DoS is still not good. However as boundary threshold growth, the detection rate increases significantly. This means that the aggressive boundary detectors are able to detect the deceiving anomalies hidden in the self region. Similarly, boundary detector can also increase the detection results of other three types of attack. Figs. 11 and 12 show the complete trend of boundary threshold’s affect on the results when self radius is 0.05 and 0.1. From these two figures we can see that boundary detector can counteract the affect of variability of self sample. Along with the increase of boundary threshold, boundary detector can ‘‘invade” the self sample and detect the deceiving anomalies. However, when boundary threshold is relatively high, the aggressiveness of boundary detectors is so strong that they cover too much self region. As a consequence, this would result in a high false alarm rate. It should be noted that self radius is another important control parameter to balance the performance of system. As shown in Table 3, when self radius is 0.1, the false alarm rate is lower than that whose self radius is 0.05. However, according to Figs. 11 and 12, when self radius is 0.1, we have to increase the aggressiveness of boundary detector so as to obtain a satisfactory detection results. This would lead to a high false rate in the solution. In summary, the main control parameters, boundary threshold and self radius can be used to balance between high detection rate and low false alarm rate. 6. Conclusions This paper proposed a method of evolving boundary detector to solve the problems with the holes on the boundary and deceiving anomalies. The major cause of these problems is the variability of self sample which is the necessary characteristic of RNS. They can hardly be handled by traditional evolved detectors because of the setting of minFitness. We use an aggressive interpretation to construct the boundary detectors, which are allowed to cover a part of self region. The boundary detectors are able to eliminate the holes on the boundary between the self and non-self region, in the case where the aggressiveness can convert part volume of self samples into their fitness. Moreover the boundary detectors have an opportunity to find the deceiving anomalies hidden in the self region. In a word, on the premise that we need the variability of self sample, the aggressiveness of boundary detector can counteract this variability and improve the performance of detection system. We carried out the experiments on 2-dimensional data to demonstrate intuitively that the boundary detectors can cover the holes on the boundary. Then the experiments on Fisher’s Iris data are performed to show that boundary detectors can detect the

deceiving anomalies. Experimental results show that boundary detectors can achieve a better detection rate. However the overmuch aggressiveness can result in a higher false alarm rate. Experiments demonstrate that the method proposed in this paper can make RNS more practical for anomaly detection. The properties of hyperspheres will change as dimension of space growth, therefore in terms of future work we intend to use the boundary detector in high dimensional space. From a further perspective, we intend to use a fuzzy set to control the aggressiveness instead of boundary threshold. Acknowledgements This work was supported by the National Natural Science Foundation of China (Grant No. 60671049), post doctor Foundation of Heilongjiang (Grant No. LBH-Z05092) and graduate innovation project of Heilongjiang (YJSCX2007-0100HLJ). References Boukerche, A., Machado, R. B., & Juca, R. L. (2007). An agent based and biological inspired real-time intrusion detection and security model for computer network operations. Computer Communications, 30, 2649–2660. Chan, P. K., & Lippmann, R. P. (2006). Machine learning for computer security. Journal of Machine Learning Research, 7, 2669–2672. Costantina, C., & Donato, M. (2007). A data mining methodology for anomaly detection in net work data. In Proceedings of the 11th international of knowledgebased intelligent information and engineering systems, Vietri sul Mare, Italy, Lecture Notes in Computer Science (pp. 109–116). Dasgupta, D., Yu, S. H., & Majumdar, N. S. (2003). MILA—multilevel immune learning algorithm. In Proceedings of the 2003 genetic and evolutionary computation conference, GECCO ’03, Chicago, IL, USA, Lecture notes in computer science (pp. 183–194). Dasgupta, D., & Gonzalez, F. (2002). An immunity based technique to characterize intrusions in computer network. IEEE Transactions on Evolutionary Computation, 6, 281–291. Denning, D. E. (1987). An intrusion-detection model. IEEE Transactions on Software Engineering, 13, 222–232. Esponda, F., Forrest, S., & Helman, P. (2004). A formal framework for positive and negative detection schemes. IEEE Transaction on Systems, Man and Cybernetics, 34, 357–373. Forrest, S., Perelson, A. S., Allen, L., & Cherukuri, R. (1994). Self-nonself discrimination in a computer. In Proceedings of the 1994 IEEE computer society symposium on research in security and privacy, Oakland, CA, USA (pp. 202–212). Garrett, S. M. (2005). How do we evaluate artificial immune system? Evolutionary Computation, 13, 145–177. Gonzalez, F., & Dasgupta, D. (2003). Anomaly detection using real-valued negative selection. Genetic Programming and Evolvable Machine, 4, 383–403. Gonzalez, F., Dasgupta, D., & Kozma, R. (2002). Combining negative selection and classification techniques for anomaly detection. In: Proceedings of the 2002 congress on evolutionary computation, CEC ’02, Honolulu, HI, USA (pp. 705–710). Gonzalez, F., Dasgupta, D., & Gomez, J. (2003). The effect of binary matching rules in negative selection. In Proceedings of the 2003 genetic and evolutionary computation conference, GECCO ’03, Chicago, IL, USA, Lecture notes in computer science (pp. 198–209). Hofmeyr, S., & Forrest, S. (2000). Architecture for an artificial immune system. Evolutionary Computation, 8, 443–473. Lane, T., & Brodley, C. E. (1999). Temporal sequence learning and data reduction for anomaly detection. ACM Transaction on Information and System Security, 2, 150–158. Patcha, A., & Park, J. M. (2007). An overview of anomaly detection techniques: Existing solutions and latest technological trends. Computer Networks, 51, 3448–3470. Simon, P. T., & He, J. (2008). A hybrid artificial immune system and self organising map for network intrusion detection. Information Sciences, 178, 3024–3042. Stibor, T., Timmis, J., & Eckert, C. (2006). On permutation masks in hamming negative selection. Lecture Notes in Computer Science, 4163, 122–135. Zhou, J. (2005). A boundary-aware negative selection algorithm. In Proceedings of the 9th international conference on artificial intelligence and soft computation, IASTAD’05, Benidorm, Spain (pp. 12–14). Zhou, J., & Dasgupta, D. (2004). Real-valued negative selection algorithm with variable-sized detectors. In Proceedings of the 2004 genetic and evolutionary computation conference, GECCO ’04, Seattle, WA, USA, Lecture notes in computer science (pp. 287–298). Zhou, J., & Dasgupta, D. (2007). Revisiting negative selection algorithms. Evolutionary Computation, 15, 223–251.