Applied Soft Computing 32 (2015) 544–552
Contents lists available at ScienceDirect
Applied Soft Computing journal homepage: www.elsevier.com/locate/asoc
BIORV-NSA: Bidirectional inhibition optimization r-variable negative selection algorithm and its application Lin Cui a,b , Dechang Pi a,∗ , Chuanming Chen a a b
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China College of Information Engineering, Suzhou University, Suzhou 234000, China
a r t i c l e
i n f o
Article history: Received 11 March 2014 Received in revised form 30 December 2014 Accepted 14 March 2015 Available online 4 April 2015 Keywords: Bidirectional inhibition optimization r-variable negative selection algorithm Self set edge inhibition strategy Detector self-inhibition strategy Detection rate Detector self-tolerance
a b s t r a c t The original negative selection algorithm (NSA) has the disadvantages that many “black holes” cannot be detected and excessive invalid detectors are generated. To overcome its defects, this paper improves the detection performance of NSA and presents a kind of bidirectional inhibition optimization r-variable negative selection algorithm (BIORV-NSA). The proposed algorithm includes self set edge inhibition strategy and detector self-inhibition strategy. Self set edge inhibition strategy defines a generalized radius for self individual area, making self individual radius dynamically be variable. To a certain extent, the critical antigens close to self individual area are recognized and more non-self space is covered. Detector selfinhibition strategy, aiming at mutual cross-coverage among mature detectors, eliminates those detectors that are recognized by other mature detectors and avoids the production of excessive invalid detectors. Experiments on artificially generating data set and two standard real-world data sets from UCI are made to verify the performance of BIORV-NSA, by comparison with NSA and R-NSA, the experimental results demonstrate that the proposed BIORV-NSA algorithm can cover more non-self space, greatly improve the detection rates and obtain better detection performance by using fewer mature detectors. Crown Copyright © 2015 Published by Elsevier B.V. All rights reserved.
1. Introduction The idea of applying biological science to solve the problems of computer science and engineering has existed for many years, among which, the artificial immune system (AIS) is one of the most important researches [1]. AIS simulates the function of biological immune system and provides a feasible solution to the complicated problems, which performs very well in the fields of machine learning, pattern recognition and anomaly detection and so on. Research on AIS began in the late 1990s. In December of 1996, the first international symposium on immune system was held in Japan and firstly proposed the concept of AIS [2]. In 1997, Ishida made a comprehensive description on AIS [3]; meanwhile, Dasgupta also published the early work about AIS model and its theory [4]. In 1994, Forrest pioneered the use of AIS theory into the field of computer anomaly detection and proposed a well-known negative selection algorithm (NSA), which is based on the self/non-self discrimination mechanism in the biological immune system and analyzes the relationship between algorithm reliability and the size ∗ Corresponding author at: College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China. Tel.: +8615996304973. E-mail address:
[email protected] (D. Pi). http://dx.doi.org/10.1016/j.asoc.2015.03.031 1568-4946/Crown Copyright © 2015 Published by Elsevier B.V. All rights reserved.
of detector set from a mathematical viewpoint [5]. In 1998, Chu et al. introduced the mathematical model into the immune algorithm and pointed out the advantages of AIS different from other optimization algorithms [6]. In 2000, de Castro proposed a clonal selection algorithm according to the clonal selection principle [7]. In 2001, Timmis et al. proposed a resource limited artificial immune algorithm, which simulated the race control mechanism in natural biological immune system to control population growth and termination conditions of the algorithm, and it has successfully applied to the knowledge discovery in database and other fields [8]. In 2005, Zhang et al. proposed the radius variable negative selection algorithm (R-NSA) which made radius be variable and decreased the amount of “black holes” existing in the original NSA [9]. In the same year, Tao presented dynamic intrusion detection based on the immune model [10]. In 2008, Timmis deeply analyzed the clonal selection algorithm, immune network and negative selection algorithm, and furthermore proved the usefulness of AIS [11]. In 2009, Zhang et al. regarded the differential evolution method as the mutation operator of immune algorithm and proposed anti idiotypic clonal selection algorithm [12]. After 2010, researches on anomaly detection in AIS are more popular and more optimized algorithms have been proposed out [13–15]. In these algorithms mentioned above, NSA proposed by Forrest [5] is one of the important algorithms of AIS that is applied
L. Cui et al. / Applied Soft Computing 32 (2015) 544–552
to generate detectors in anomaly detection. Since NSA was firstly conceived, it has attracted many AIS researchers and practitioners and has gone through some phenomenal evolution. For example, NSA algorithms could not achieve a high detection rate as its fixed radius. Notably, R-NSA proposed by Zhang [9] makes the detector radius be variable and improved the detection rate of NSA to some extent. However, R-NSA is still not the most optimal algorithm, and R-NSA has also some inevitable disadvantages. Aiming at the defects existing in NSA and R-NSA, bidirectional inhibition optimization r-variable negative selection algorithm (BIORV-NSA) is proposed in this paper. The remainder of the paper is organized as follows. In the next section, some definitions about AIS and NSA are introduced in detail; then the definite procedure of NSA is elaborated and its merits and disadvantages are also pointed out. In Section 3, the core idea of the proposed BIOR-NSA is illustrated and its details are also given out. Section 4 describes our experiments on artificially generating data and two standard real-world data sets from UCI, and compares BIOR-NSA with NSA and R-NSA. Finally, Section 5 summarizes the whole paper briefly and points out directions for future works. 2. Theoretical foundation 2.1. Basic definitions Immunization is the state maintaining process of physical body which relies on antibodies to discriminate self and non-self antigens. In the artificial immune theory, antibodies are defined as detectors which are used to recognize non-self elements, defining Ab as antibody, thus the detection performance depends on the quality of detectors. Antigen refers to the element in training data set, defining Ag as antigen and the collection of all antigen elements is called the antigen set denoted as AG, in which, the normal elements are called self set denoted as SS, the abnormal elements are called non-self set denoted as NS. Regulating SS ∪ NS = AG, SS ∩ NS = [16]. All space that are detected by immune system are defined as SCOPE, the range detected by antibody Ab is denoted as ScopeAb , the effective space of self set is called SCOPESS , self set individual detection space is named as ScopeSs [16]. Not considering the space limiting condition, there exist ScopeAb ⊂ / SCOPE, AG ⊂ SCOPE, SCOPESS ⊂ ∪ ScopeSs [17]. The basic definitions on NSA and AIS required by this paper are defined as follows: Definition 1. Affinity: It refers to the matching degree between antibody and antigen, which is usually used to represent recognition threshold of a detector in the detection space. The most commonly used affinity is distance affinity, also called the detector radius. The affinity based on distance or the detector radius is calculated as follows [18]:
affinity (Abi , Agj ) =
(Abi .propk − Agj .propk )
2
(1)
k∈valid−prop
where Abi and Agj denote antibody and antigen respectively, propk represents the k th attribute of antibody or antigen.
545
Fig. 1. Producing mature detector set D.
2.2. Negative selection algorithm (NSA) [5] Negative selection algorithm is presented to recognize self and non-self according to the recognition principle of the biological immune system, which simulates the immune tolerance of T lymphocytes, randomly produces the detectors and tolerates to eliminate the detectors that can recognize self, thus the rest are kept as the mature detectors that are used to detect non-self individual. The algorithm comprises of the data representation phase, the training phase and the testing phase. In the data representation phase, data are represented in a binary or in a real valued representation. The training phase and the testing phase are introduced as follows: (i) The training phase of the algorithm or the detector generation phase is shown in Fig. 1. If the randomly generating detector R does not match any self individual in self set, then R becomes a mature detector and put it into the mature detector set D, until mature detector set D is eventually formed. (ii) The testing phase is shown in Fig. 2. Using mature detector set D to test the inspecting data a, if matching is successful, the inspecting data a is regarded as non-self data, Otherwise, start the next round of judgment. Through analyzing NSA algorithm, it can be observed that NSA has very strong robustness which does not much rely on known data, can identify the unknown abnormal data and it has also inborn ability of parallel execution. But NSA has many shortcomings that the detector radius is generally constant so that non-self area uncovered by some detectors has to rely on other more mature detectors, which leads to require more mature detectors to cover non-self space. The smaller the detection radius, the more detectors are required, and the higher the dimension, the bigger the detector radius is also. In addition, from NSA algorithm, we can know about whether the detector is mature or not depends on the self set, and the operation scale keeps exponential relationship with the scale of self set. The larger the detection space is, the higher the detector generation
Definition 2. [19] Pattern: It is a symbol string consisting of l symbols denoted as X = X1 X2 X3 . . . Xl , among which symbol Xi (i = 1, 2, 3 . . . l) takes 0 or 1 in this paper. Definition 3. Matching rules [20]: At present there are many matching rules such as Hamming distance matching, r-contiguous matching and r-chunk matching etc. The most commonly used is r-contiguous matching and its definition is as follows: for the strings of length L, a = a1 a2 a3 . . . aL and b = b1 b2 b3 . . . bL meet the rcontinuous bits matching length, if and only if, ∃i ≤ L − r + 1 makes aj = bj , j = i, i + 1, . . ., i + r − 1.
Fig. 2. Testing the inspecting data.
546
L. Cui et al. / Applied Soft Computing 32 (2015) 544–552
cost would be. Because mature detectors are randomly generated based on probability theory, there would exist cross-coverage of mature detectors. That is to say, detection range of mature detectors would cover each other. In a very short period of time, the detectors could reach saturation, and lastly it would cause that all non-self area could not be covered completely by mature detectors, the undetectable non-self space are called “black holes”. 3. The proposed BIORV-NSA Aiming at the drawbacks existing in NSA, in order to improve the performance of NSA greatly, this paper presents a bidirectional inhibition optimization r-variable negative selection algorithm (BIORV-NSA). The proposed BIORV-NSA algorithm mainly includes two sub sections that are self set inhibition strategy and detector self-inhibition strategy, among which, self set inhibition strategy makes the detector radius be variable and achieves better coverage of non-self space with the mature detectors; detector self-inhibition strategy avoids mutual cross-coverage among the mature detectors through the candidate detectors tolerating the already existing mature detectors, which reduces the detector generating cost and the number of mature detectors, making the randomly generating process of detectors be controllable. The detailed introductions about self set inhibition strategy and detector self-inhibition strategy are as follows respectively: 3.1. Self set edge inhibition strategy In AIS, the detection radius of every antibody is variable. Theoretical analysis demonstrates that the smaller the detection radius, the more detectors are required. In the same way, when the number of detectors is fixed, the smaller the detector radius, the not covered “black holes” would be very larger; contrarily, we can get better results. Define the radius of antibody Ab . r also called the detector radius, which is represented as the formula (2): Ab.r = min(affinity(Ab, SS) − SS.r)
(2)
where SS . r refers to a generalized radius of self individual and it is generalization estimation of the effective self region around self individual. The region taking SS . r as radius is equivalent to “self area” and there only exists self individual in this area. As when the antibody self tolerates, the detecting set is not complete self set, but the training set. There would exist one or more individual detector radius overflow area. That is, for each sub region ScopeSS . If SS . r = 0, then SCOPESS would be equivalent to ∪ScopeSS . The coincidence degree between SCOPESS and ∪ScopeSS depends on the discrete degree of antigens, the more discrete the antigen, the more different the size of the two region. Therefore, in the process of the antibody self tolerating, the detection radius of each antibody is inhibited by the detection radius of boundary self individual, and the inhibition degree is affected by the discrete degree of antigen. When the inhibition arrives at the generalized radius of self individual SS . r, this value is the inhibition special value. In fact, this inhibition value would be less than the value of self set detection radius; the dynamic radius function is as follows: Ab.r = min(affinity (Ab, SS) − k · SS.r) k ∈ (0, 1)
(3)
Self set edge inhibition strategy is shown in Fig. 3: As Fig. 3 illustrates, it is visible that self set edge inhibition strategy can reduce the mature detectors generating cost to a certain degree. Under the same amount of detectors with NSA, it can cover a wider area, so the size of “black holes” is naturally lowered relatively. Under the strategy of dynamic changing radius, the coverage of detection range can be improved significantly, and improve detection rate indirectly. To some extent, the recognition
Fig. 3. Self set edge inhibition strategy.
of the critical antigen near to self individual threshold is solved, and simultaneously reduces the parameters dependence of the whole detection process on the predetermined detection radius. 3.2. Detector self-tolerance strategy Through executing self set inhibition strategy, the area of “black holes” is obviously reduced, however, the number of needed detectors is still not what we expected and there exist excessive invalid detectors in NSA. As the typical detector generation mechanism in NSA is a randomized algorithm, uncontrollable randomly generated detectors bring great cost to the algorithm performance. According to the random properties of the detectors, there is the following corollary: Corollary 1. When the size of antibody set AB reaches a certain degree, there is: ScopeAbi ⊂ ScopeAbj (Abi , Abj ∈ AB)
(4)
On this condition, there exists: affinity(Abi , Abj ) + Abi .r <= Abj .r
Abi ∈ ScopeAbj
(5)
The relationship between Abi and Abj is shown in Fig. 4: Corollary 1 is also represented as: ScopeAbi ⊆ ScopeAbj (Abi , Abj ∈ AB) ⇔ affinity(Abi , Abj ) + Abi .r <= Abj .r ∧ Abi ∈ ScopeAbj , which can be proved by contradiction and the detail proof is the following two steps: (1) Necessity of the proposition is proved below: / ScopeAbj , it is obvious that (i) Suppose that Abi ∈ / ScopeAbj (Abi , Abj ∈ AB), which is inconsistent ScopeAbi ⊂ with the corollary self. (ii) Assume that affinity(Abi , Abj ) + Abi . r > Abj . r and Abi ∈ ScopeAbj , affinity(Abi , Abj ) is the distance from the central point of Abi to the central point of Abj , as Abi ∈ ScopeAbj , then the distance between the two central points should be less than or equal to the radius of Abj , that is affinity(Abi , Abj ) ≤ Abj . r. Suppose ˇ = Abj . r − affinity(Abi , Abj ), then ˇ is the distance from Abi to the boundary of Abj , as ScopeAbi ⊆ ScopeAbj , then ˇ ≥ Abi . r, that is ˇ − Abi . r ≥ 0. According to the above hypothesis affinity(Abi , Abj ) + Abi . r > Abj . r, then
L. Cui et al. / Applied Soft Computing 32 (2015) 544–552
547
3.3. Regulation of detector self-inhibition Although BIORV-NSA algorithm effectively improves the detector coverage rate and decreases the cost of generating the mature detectors. Under this algorithm, if strictly limiting the detector selfinhibition strategy, it would lead to the fact that the generating randomized detectors could no longer mature in a sustained period of time. If the number of mature detectors is strictly limited, we need to ease antibody self inhibition strategy, which would cause the boundary areas to be recovered again. In order to solve the problems existing in BIORV-NSA, the following measures are needed to taken: assuming that the candidate detectors could not mature in continuous time t, and then suppose mature detectors have met the basic requirements. Time t depends on the largest detection radius among the mature detectors and the number of residual requiring generating. Remember the maximum detection span in the SCOPE spatial dimension is SCOPE.max r, the maximum detection radius of the mature detectors is AB.max r, and the remaining amount is AB.residual. The required detection quantity N is expressed as follows: N = AB.residual ·
Fig. 4. Cross-coverage of antibody.
there are affinity(Abi , Abj ) + Abi . r > affinity(Abi , Abj ) + ˇ, the simplification is Abi . r > ˇ, i.e. ˇ − Abi . r < 0. It is visible that there exists contradiction with the above ˇ − Abi . r ≥ 0. The proposition’s necessity is proved out. (2) Sufficiency of the proposition is proved below:
Suppose ScopeAbi ⊂ / ScopeAbj (Abi , Abj ∈ AB), then there are two
cases: when Abi ∈ / ScopeAbj , obviously it is inconsistent with the
proposition; when Abi ∈ ScopeAbj , note that ˇ = Abj . r − affinity(Abi ,
Abj ), among which, ˇ is the maximum distance from the central point of Abi to the edge of Abj. As already supposing ScopeAbi ⊂ / ScopeAbj , it illustrates ˇ should be less than the detec-
tion radius of Abi , i.e. ˇ − Abi . r < 0. As Abi ∈ ScopeAbj , there exists affinity(Abi , Abj ) ≤ Abj .r. And because affinity(Abi , Abj ) + Abi .r ≤ Abj .r, then Abi .r ≤ Abj .r − affinity(Abi , Abj ), i.e. Abi .r ≤ ˇ, that is ˇ-Abi .r ≥ 0, It is visible that there also exists contradiction with the above ˇ − Abi . r < 0. Sufficiency of the proposition is proved out. Combining the proposition’s necessity with the proposition’s sufficiency, the Corollary 1 is proved to be correct lastly. From Fig. 4, it can be observed that not all the detectors can be effectively utilized, the more the number of detectors, for the r-variable detectors, there are more detectors in the state where corollary 1 describes. In order to avoid this problem, referring to the function of immune tolerance, the following strategies are adopted: if the candidate detectors are recognized by the existing mature detectors, the candidate detectors would be eliminated. The inhibition degree depends on the recognition degree. This inhibition can greatly reduce the cost of generating mature detectors. The judgment of the tolerance success is shown in the formula (6): affinity(Abi , Abj ) > Abj .r − ˛ · AB.r, Abj ∈ AB
(6)
From the above description about self set edge inhibition strategy and detector self-tolerance inhibition strategy, it can be observed that the proposed optimal algorithm for NSA facilitates two kinds of performance optimization, so the proposed algorithm is called bidirectional inhibition optimization r-variable negative selection algorithm (BIORV-NSA).
SCOPE. max r · AB. max r
(7)
In the formula (7), the detector value N is regarded as a measure value of continuous immature individual. If outnumbering this threshold value, then the generation of the mature detectors is terminated. represents the mandatory of the coverage extent, as increases, the detection amount and the invalid detectors judgment times are both increased. 3.4. Procedure of BIORV-NSA algorithm From the above definite descriptions about self set edge inhibition strategy, detector self-tolerance strategy and the regulation of detector self-inhibition, it can be observed that many parameters are used to denote the relative concepts existing in BIORV-NSA. With the difference of inhibition parameters, different inhibition scheme can be formulated for different problems. The core idea of BIORV-NSA is described as follows: Step 1 Self set SS and the self set detection radius SS.r are firstly determined, detector inhibition counter is denoted as CounterAB and CounterAB is set to zero initially. Step 2 Randomly generating antibody Ab. Step 3 According to self set edge inhibition strategy mentioned above, antibody Ab performs inhibition tolerance on self set SS, if tolerance fails and Ab is recognized by self set, then this Ab is eliminated. Continuously repeat step 2, if the self recognition fails, according to the functions defined in self set edge inhibition strategy, assign dynamic radius SS.r to Ab and then execute self set edge inhibition strategy. Step 4 According to the detector self-inhibition strategy and the regulation of detector self-inhibition, Ab is judged whether there exists cross-coverage with the other mature detectors. If exists overlapping, and when the identification arrives at the threshold value N mentioned in the formula (6), Ab is abandoned, then CounterAB increased. Contrarily, if the inhibition is failed, then Ab is added to AB mature detector set and CounterAB reset to 0. Step 5 Repeating steps 2 to step 4 continuously until the mature detector set AB is full or CounterAB arrives at threshold value N represented by the formula (6). Fig. 5 illustrates the core idea of the BIORV-NSA. From the core idea of the proposed BIORV-NSA, it can be further presented that the boundary antigens can be identified as the introduction of the generalization radius SS.r, and the required mature detectors are greatly reduced because many invalid detectors are eliminated. That is to say, BIORV-NSA performs self set individual
548
L. Cui et al. / Applied Soft Computing 32 (2015) 544–552
Fig. 5. Core idea of BIORV-NSA.
dynamic optimization and the detector inhibition optimization for NSA. In order to verify the performance of BIORV-NSA, the following simulation experiments are executed. 4. Experiments In this section, this paper presents some experiments on an artificially generating data set and two standard real-world data sets from UCI (http://archive.ics.uci.edu/ml/), which have been carried out to test the effectiveness of the BIORV-NSA by comparison with NSA and R-NSA. Experimental data and environment are described definitely in Section 4.1. Section 4.2 presents the experimental results on three data sets respectively. 4.1. Experimental data and environment In order to verify the effectiveness of the proposed BIORVNSA method based on the self set edge inhibition strategy and the strategy of detector self-inhibition, a randomly generating 2dimensional data and two standard real-world data sets from UCI were selected as experimental data samples, two standard realworld data sets from UCI are a 4-dimensional simple data set Skin Segmentation and a high dimensional complex data set Hepatitis respectively. The BIORV-NSA is programmed using C++ language, and Self set SS, antigen set AG and detector set AB are all programmed using class. Two data processing models are programmed to store data, among which, the one adopts static lists, suitable for the static data of a certain number, another uses the dynamic linked lists for a large number of dynamically growing data. The experimental comparisons with the traditional NSA and R-NSA are analyzed as follows. 4.2. Experimental analysis In the following experiments, BIORV-NSA is divided into two kinds of algorithms that are a bidirectional weak inhibition optimization r-variable negative selection algorithm (BWIORV-NSA) and a bidirectional strong inhibition optimization r-variable negative selection algorithm (BSIORV-NSA). The difference between
BWIORV-NSA and BSIORV-NSA lies in the inhibition on the number of mature detectors. BWIORV-NSA would detect all the number of detectors as for as possible; however, BSIORV-NSA would not detect all the number of detectors, according to the regulation in the formula (7), when iterating to a certain extent, BSIORV-NSA would no longer execute the detector judgment. Additionally, In the BSIORV-NSA and BSIORV-NSA, the simulation parameters are as follows: self set edge inhibition parameter is 0.8 and antibody self-inhibition parameter is 1.2, (i) Experiment on randomly generating 2-dimensional data set The randomly generating 2-dimensional data can make problems be visualized, and data generation adopted the plane region segmentation method. In the 200 × 200 planes, square is divided into many blocks that the length and width are both unit, assigning an attribute value to each unit, then decide whether each block is self or non self, and lastly all the generated data are performed noise fuzzy. The whole data set is divided into two groups, the one is self data as majority and another is non-self data as majority. On these two data sets, NSA, R-NSA and BIORV-NSA (including BWIORV-NSA and BSIORV-NSA) each run 25 times independently. The average test results are as follows: (a) Self data as majority: generating a total of 420 data, including 400 self data, 20 non-self data and 26 critical data, the comparative experimental results are shown in Table 1. From Table 1, it can be observed that BSIORV-NSA is the best in the dominant self data set. BSIORV-NSA only needs to generate about 9 mature detectors on average, which makes the detector generating time, the average false alarm, the average missing report and the average detection rate all be greatly reduced compared with the other three algorithms. About the concepts of the average false alarm, the average missing report and the average detection rate, please refer to the Ref. [21]. (b) Non-self as majority: generating a total of 210 data, among which there are 10 self data, 200 non-self data and 6 critical data, the comparative experimental results are shown in Table 2.
L. Cui et al. / Applied Soft Computing 32 (2015) 544–552
549
Table 1 Experimental results on self data as majority. Algorithm
Antibody (effective/total amount)
Generating time (ms)
Average false alarm
Average missing report
Average detection rate (%)
NSA R-NSA BWIORV-NSA BSIORV-NSA
400/428.95 400/430.58 400/1725.28 9.18/97.09
1.62 1.67 4.57 0.77
13.24 0.25 0.10 0
1.10 0 0.034 0
96.59 99.94 99.97 100.00
From Table 2, it can be observed that, in the case of many nonself data and very small self data, the amount of the randomly sampled data by BIORV-NSA algorithm is quite small. Under the very little pre-knowledge, the BWIORV-NSA plays an important role. Compared with BSIORV-NSA, in the result of BWIORVNSA, there are about 11 detectors under the acceptable range; meanwhile, the relatively high average detection rate shows its superiority compared with the other three algorithms NSA, R-NSA and BSIORV-NSA. By further comparison with the experimental results in Tables 1 and 2, it can be observed that, for the large area of the abnormal data, the average detection rates of all the algorithms all decreased a little, which may be related to the sampling number. Not considering NSA, the average false alarm and average missing report of the other three algorithms are both up to a certain extent. The phenomenon reflected by BWIORV-NSA is described in the planar graph as shown in Fig. 6. As Fig. 6 shows, yellow circles represent the coverage area of the mature detectors, white areas mean the uncovered region, green areas represent the self region, and red spots refer to non-self elements. It can obviously be found that there are some red points in green groups. Out the areas covered by the mature detectors, in addition to the middle region, non-self elements have already been in the periphery. Therefore, in the second group of non-self data as majority, it can be inferred that most of the missing report are caused by the critical data, and it can be concluded that BWIORVNSA is slightly dominant. (ii) Experiments on Skin Segmentation data set Skin Segmentation data set is a 4-dimensional data set, its first three dimensions are respectively red, green and blue (R, G, B) values that represent the normal facial color data in the range of [0,255], and the fourth dimension represents the class attribute. Total learning sample size of the Skin Segmentation data set is 245,057, among which 50,859 are the skin samples and 194,198 are non-skin samples. Because Skin Segmentation data set can bring more intuitive analysis and has a higher reference value, the experiments are analyzed importantly in this section. Setting the detection scope of each color points is less than 10 and the detection radius is 17. As the normal color is not distributed in the whole space, in RGB model, it can be estimated that only three basic colors approach the face color in eight basic colors, meanwhile in the three basic colors, only about half belong to the normal colors according to face color domain. In order to analyze the BWIORV-NSA and the BWIORV-NSA conveniently, the expected value of detectors is greatly reduced to 1200. Experimenting on this processed data set,
the comparative results among the NSA, R-NSA, BWIORV-NSA and BSIORV-NSA are shown in Table 3. From Table 3, it can be observed that BIORV-NSA algorithm has been improved to a certain extent than R-NSA in the detection rate. Under the BSIORV-NSA, the number of the mature detectors fell nearly 1/3 of the original. Under the pressure of very huge detecting data, BWIORV-NSA and BSIORV-NSA both require the less number of generating mature detectors and their detection rates are also higher compared to NSA and R-NSA. The following is that BWIORV-NSA and BSIORV-NSA are analyzed more deeply. Firstly, the inhibition parameters involved in the two algorithms are set to the parameter domains respectively. Referring to the self set edge inhibition strategy mentioned in Section 3.1, the extreme value of self set edge inhibition is the length of a radius, so the self set edge inhibition value is set to the range of [0.1,0.9] and 0.1 is step value. According to the detector selfinhibition strategy and the regulation of detector self-inhibition described in Sections 3.2 and 3.3, the extreme value of detector set-inhibition is the length of a whole cell, i.e. a diameter, so the detector self-inhibition value is set to the range of [0.2,1.9] and 0.1 is step value. The computing results are represented by 3D mapping as shown in Fig. 7. In Fig. 7, the Inhibition Detect in X axis is the detector self inhibition parameter, the Inhibition Train in Y axis is the self set edge inhibition parameters and Z axis is total number of antibody and shaft. From Fig. 7, the general trend can be observed, when X is in [0.5,1.5], color is deeper, Y is not very obvious, we will remapped 3-dimensional map to 2-dimensional map, which can be shown in Fig. 8. Obviously, it can be observed that when Y axis is in [0.4, 0.9], there is also deeper color in Fig. 8. The false number analysis diagram is shown in Fig. 9. It is obvious that there are five high peaks in Fig. 9, in which there are four high peaks in X > 2.0. Because X has already been set to the range of [0.5,1.5], these four high peaks in X > 2.0 would be neglected. The remaining high peak lies in Y = 0.8, which could be used as a special case. Through the 2-dimensional map above 3D map in Fig. 9, it can be very clearly observed that when X and Y are smaller, the number of false negatives is lower, which represents that although strong inhibition strategy has been significantly improved in the detector number, detection rate is affected, the high detection area mostly appear in X < 1.5, Y < 0.7. And combined with the detector number in Figs. 7 and 8, a more satisfactory interval can be got: X ∈ [0.5, 1.5], Y ∈ [0.4, 0.7]. Compared with the above figures listed above, in the interval X ∈ [0.5, 1.5], Y ∈ [0.4, 0.7], a pair of optimal value X = 0.7, Y = 0.6 is taken out to experiment again, the experimental result is shown in Table 4.
Table 2 Experimental results on non-self data as majority. Algorithm
Antibody (effective/total amount)
Generating time (ms)
Average false alarm
Average missing report
Average detection rate (%)
NSA R-NSA BWIORV-NSA BSIORV-NSA
400/406.67 400/406.41 11.34/749.14 7.03/115.2
1.52 1.68 0.59 0.13
1.14 0.27 0.66 0.40
14.48 4.18 3.07 5.00
92.56 97.88 98.22 97.43
550
L. Cui et al. / Applied Soft Computing 32 (2015) 544–552
Fig. 6. Plane display of BWIORV-NSA in 2-dimensional data.
Table 3 Experimental results on Skin Segmentation data set. Algorithm
Antibody (effective/total amount)
Generating time (ms)
Detection time (ms)
False alarm
Missing reporting
Detection rate (%)
NSA R-NSA BWIORV-NSA BSIORV-NSA
1200/1274 1200/1275.57 1151.3/64,039.6 960.6/48,746.2
1786.75 1767.86 87,928.4 67,265.8
60,827.5 7823 6188.7 5178.3
9.83 0 0 0
43,234.67 3499.86 2932 2108.8
82.35 98.57 99.00 99.14
Fig. 8. The total Z-antibody 2-dimensional map. Fig. 7. Antibody total map.
Table 4 Experimental results on optimal value X = 0.7, Y = 0.6. Algorithm
Antibody (effective/total amount)
Generating time (ms)
Detection time (ms)
False alarm
Missing reporting
Detection rate (%)
BWIORV-NSA BSIORV-NSA BWIORV-NSA BSIORV-NSA
1151.3/64,039.6 960.6/48,746.2 1152.8/65,635.6 957.6/51,370.4
87,928.4 67,265.8 89,229.4 65,932.7
6188.7 5178.3 5990.9 5094.4
0 0 0 0
2932 2108.8 2367 2025.29
99.00 99.14 99.03 99.17
L. Cui et al. / Applied Soft Computing 32 (2015) 544–552
551
Fig. 9. Z-FN three-dimensional map.
Table 5 Experimental results on Hepatitis data set. Algorithm
Antibody (effective/total amount)
Generating time (ms)
Average false alarm
Average missing report
Average detection rate
NSA R-NSA BWIORV-NSA BSIORV-NSA
540,000/540,000 540,000/540,000 4992/157,992 4992/176,486
3507.3 3672.0 61,965.4 69,414.0
0 0 27 11
32 32 22 20
– – 68.39% 80.00%
As Table 4 shows, the setting of new parameters that X = 0.7 and Y = 0.6 has a good influence on the results optimization, which indicates that the current selected range has good performance on this data set.
their generating time is high, and the “black holes” region under high dimension data is difficult to reduce. BSIORV-NSA inhibition algorithm has improved a lot in contrast with the other three algorithms.
(iii) Experiments on complex data set Hepatitis
5. Conclusion and future works
Hepatitis data set is a complex data set which includes a total of 20 dimensional attributes. This data set mainly describes the vital signs of human body, includes a total of 155 sets of data, among which 123 sets of data belong to the survival, 32 groups of data belong to the death. The fifteenth and nineteenth attributes are abandoned and the rest 17 attributes are used. In order to decrease too long time brought by the high dimensional data, category code is selected to identify, and according to the dimension categories, the number of detectors is found out to be 524,288 by using the probability. Because the complexity of inhibition generating algorithm is high, 5000 detectors are taken out in the inhibition algorithm. We can get the following results as shown in Table 5: It can be seen from the results that NSA and R-NSA could not determine abnormal state in Hepatitis data set, so the detection rate cannot be calculated out. But inhibition algorithms BWIORVNSA and BSIORV-NSA can cover areas as soon as possible to achieve the purpose of detection. From Table 5, it can be observed that all the negative algorithms are fatigue in the high dimensional data,
This paper firstly introduced the original NSA, aiming at the disadvantages existing in NSA, a novel detector generation algorithm, namely the BIOR-NSA is proposed in this paper. BIOR-NSA algorithm optimized NSA in the ways of the parameter dependence, coverage efficiency and boundary detection and so on. Adopting artificially randomly generating data sample and the UCI two standard real-world data sets, some depth experimental comparisons with NSA and R-NSA were made. Experimental results proved that BIOR-NSA algorithm can relatively reduce the size of the “black holes”, the operation time and increase the detection rate, especially the number of mature detectors is significantly reduced by changing the threshold and the detector self-tolerance, which obviously indicate the BIOR-NSA algorithm is an effective algorithm. However, BIOR-NSA is not optimized to the limitation in a certain degree, there are many shortcomings. For example, the inhibition of the immune system is not only in the tolerance, but also more in dynamic adjustment. From the biological balance perspective, appearance of new antibodies represents the death of old
552
L. Cui et al. / Applied Soft Computing 32 (2015) 544–552
antibodies, whereas in BIOR-NSA, the mature detectors do not die forever, which means that the detectors would be restricted by the existing mature detectors. This generation is similar to the greedy method and cannot achieve the optimization to a certain extent. Therefore, the proposed BIOR-NSA can be further improved from the following two aspects: (1) Dynamically changing the central position of mature antibodies. Different from the cloning learning, the displacement action of mature antibodies is carried out through the stimulation of new antibodies or new itself antigens, which would move from the side of high stimulus to the side of low stimulus, meanwhile, their own detection radius would be changed. (2) Relearning the decision result. When the anomaly detection programs analyze and identify the abnormal data, the results are tend to be discarded. However, the detectors can re-learn the results in fact, which is also one of our future researches. Acknowledgments This work was supported by the National Natural Science Foundation of China Civil Aviation joint Fund (Grant No. U1433116) and the key project of Anhui Province Colleges and Universities Natural Science Foundation of China (No. KJ2014A250). Furthermore, thanks editors and reviewers for all the valuable advice. References [1] E.Y.C. Wong, et al., Advancement in the twentieth century in artificial immune systems for optimization: review and future outlook, in: IEEE International Conference on Systems, Man and Cybernetics, San Antonio, 2009, pp. 4195–4202. [2] Y. Ishida, The immune system as a self-identifcation process: a survey and a proposal, in: Proceedings of ICMAS International Workshop on Immunlty-based Systems, Kyoto 1996, pp. 2–12.
[3] Y. Ishida, The immune system as a prototype of autonomous decentralized systems: an overview, in: The Third International Symposium on Autonomous Decentralized Systems, Berlin 1997, pp. 85–92. [4] D. Dasgupta, Artificial neural networks vs. artificial immune systems, in: The 6th International Conference on Intelligent Systems, Boston 1997, pp. 174–177. [5] S. Forrest, A.S. Perelson, et al., Self-nonself discrimination in a computer, in: Proceedings of IEEE Symposium on Research in Security and Privacy, Los Alamitos, 1994, pp. 202–212. [6] J.S. Chu, Comparative study of immune algorithm and simulated evolutionary optimization algorithm, Inf. Electr. Power 1 (1998) 61–63. [7] L.J. de Castro, J. Timmis, Artificial Immune Systems: A New Computational Intelligence Approach, Springer Press, 2002. [8] L.J. de Castro, J. Timmis, An artificial immune network for multi-modal function optimization, in: Proceedings of the 2002 Congress on Evolutionary Computation, Honolulu, 2002, pp. 698–704. [9] H. Zhang, L.F. Wu, et al., An algorithm of r-adjustable negative selection algorithm and its simulation analysis, Chin. J. Comput. 10 (2005) 1614–1619. [10] L. Tao, An immune based model for network monitoring, Chin. J. Comput. 9 (2006) 1515–1522. [11] J. Timmis, Theoretical advances in artificial immune systems, Theor. Comput. Sci. 1 (2008) 11–32. [12] L.N. Zhang, M.G. Gong, et al., Clonal selection algorithm based on anti-idiotype, J. Softw. (China) 5 (2009) 1269–1281. [13] W. Chen, X.M. Ding, et al., Negative selection algorithm based on grid file of the feature space, Knowl.-Based Syst. 56 (2014) 26–35. [14] J. Zhang, W.J. Luo, EvoSeedRNSAII: an improved evolutionary algorithm for generating detectors in the real-valued Negative Selection Algorithms, Appl. Soft Comput. 19 (2014) 18–30. [15] P. Mario, E. Andries, Application of the feature-detection rule to the Negative Selection Algorithm, Expert Syst. Appl. 40 (2013) 3001–3014. [16] R.K. Gopal, S.K. Meher, A rule-based approach for anomaly detection in subscriber usage pattern, Int. J. Math. Phys. Eng. Sci. 3 (2007) 171–174. [17] F.A. González, D. Dasgupta, et al., A randomized real-valued negative selection algorithm, in: International Conference on Artificial Immune Systems, Edinburgh, 2010, pp. 261–272. [18] X.Z. Gao, M.Y. Chow, et al., Theory and applications of artificial immune systems, Neural Comput. Appl. 8 (2010) 1101–1102. [19] J.Q. Zheng, Y.F. Chen, et al., A survey of artificial immune applications, Artif. Intell. Rev. 1 (2010) 19–34. [20] D. Dasgupta, S.H. Yu, et al., Recent advances in artificial immune systemsmodels and applications, Appl. Soft Comput. 2 (2011) 1574–1587. [21] J. Zhou, D. Dasgupta, Revisiting negative selection algorithms, Evol. Comput. 2 (2007) 223–251.