Research on virus detection technique based on ensemble neural network and SVM

Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎ Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom Research ...

Download PDF

1MB Sizes 0 Downloads 10 Views

Report

PDF Reader
Full Text

Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Research on virus detection technique based on ensemble neural network and SVM Bo-yun Zhang a,n, Jian-ping Yin b, Shu-Lin Wang c, Xi-ai Yan a a

Department of Computer Science and Technology, Hunan Police Academy, 410138 Changsha, China School of Computer Science, National University of Defense Technology, 410083 Changsha, China c School of Computer and Communication, Hunan University, 410082 Changsha, China b

art ic l e i nf o

a b s t r a c t

Article history: Received 4 January 2013 Received in revised form 19 March 2013 Accepted 8 April 2013

Computer viruses have become a serious threat to the information system. Due to the complexity and behavioral uncertainty of virus codes, as well as the emergence of encryption and metamorphic viruses which lead to the ineffectiveness of traditional detection methods, applying artiﬁcial intelligence based approach to virus detection has become the focal issue of the current antivirus research. In this paper, we propose a novel approach that introduces ensemble learning into automatic virus detection technique, which is based on the integration of dynamic virus detection and static detection. The detection system utilizes support vector machine (SVM) as member classiﬁer to construct the dynamic behavior model of viruses, and also uses probabilistic neural network (NN) as member classiﬁer for static behavior modeling. Finally, the detection results from all member classiﬁers are integrated by D–S theory of evidence. The experiments show that the diversity of combining heterogeneous classiﬁers leads to the great performance improvement of the ensemble method of virus detector. The experimental results show that the proposed approach is very efﬁcient in detecting unknown and metamorphic viruses, and further comparison indicates that its performance is superior to most of the popular commercial antivirus tools. & 2014 Elsevier B.V. All rights reserved.

Keywords: Information security Computer virus Ensemble neural network D–S theory of evidence

1. Introduction The constantly increasing virus problems have become one of the greatest threats to information security. And the emergence of encryption and metamorphic viruses leads to the invalidity of the traditional code scanning method. Therefore the further study on a new antivirus method is urgently needed. Currently, the intelligence and automation of virus detection engine becomes the research focus in this ﬁeld. Kephart et al. [1] in IBM Watson research center studied the application of neural network in virus detection, and used it for detection of boot sector viruses, which achieved a good detection effect. However, at that time, only 200 boot sector viruses were detected, which accounted for 5% of the total number of virus samples in IBM virus research center. After that, Arnold et al. [2] applied a similar technique to Win32 program detection. Assaleh et al. [3] observed that n-gram extracted from the sequence of bytes of a program can be taken as the effective information for classiﬁcation of malicious codes. They used Common n-gram as the feature, and applied K-nearest neighbor (KNN) classiﬁcation algorithm to the detection of computer viruses. Kolter et al. [4] applied machine

n

Corresponding author.

learning algorithm to the detection experiment of malicious codes. They collected 1971 normal programs and 1651 malicious codes, and used n-gram which was extracted from the byte code as a classiﬁcation feature. They took WEKA [5] as the experimental platform, and tested a total of eight kinds of classiﬁers, of which the ensemble classiﬁer Boosted J48 achieved the best performance. Namely, an ensemble classiﬁer has the better performance than a single classiﬁer. Bergeron et al. [6] focused on the role of the system calling sequence of a malicious code in detection. They made a reduction for the program's control ﬂows to generate a subgraph. A node in the graph represented some system calling. Then, they detected whether an abnormal system calling was included in the subgraph. When the number of abnormal callings in the detected program exceeded a certain threshold, it was identiﬁed as a malicious code. Akira Mori [7] combined static code analysis technique and code simulation technique to detect common malicious behaviors in viruses, such as spamming, ﬁle infection and registry covering. Program behaviors were obtained from the observation on OS system callings. The author tested viruses in the mail system. Victor Skormin et al. [8] noticed that most normal programs did not conduct “self-replication” operation. Based on it, they put forward a concept of “selfreplicating gene”, and transformed polymorphic malicious code detection into polymorphism detection of self-replicating genes. The author applied this technology in unencrypted script virus

http://dx.doi.org/10.1016/j.neucom.2013.04.055 0925-2312 & 2014 Elsevier B.V. All rights reserved.

Please cite this article as: B.-y. Zhang, et al., Research on virus detection technique based on ensemble neural network and SVM, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2013.04.055i

2

B.-y. Zhang et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

detection so as to detect unknown viruses, and illustrated that this technique could be used to detect new malicious codes which were compiled from the new programming language. Christodorescu and Somesh Jha [9] used a technique based on code obfuscation to test three common commercial antivirus software, i.e., Norton antivirus, Sophos antivirus and McAfee Virus Scan. The comparison results showed that commercial antivirus tools had a weak obfuscation ability to ﬁght against viruses. Jesse et al. [10] designed an anomaly detection system DOME, suitable for detecting injection-type, dynamically generated and obfuscated malicious codes. This system used ‘IDA pro’ static disassembly binary code to get calling of Win32 APIs in the process, and monitored all API calling situations. The system also built a model for normal behaviors in the program, and then observed whether the API used in the test process was different from a normal pattern. However, the static API function calling technology could not process dynamic load libraries, nor ﬁght against encrypted viruses, which were its weak points. Akira Mori et al. [11] made a comprehensive utilization of code simulation, static code analysis, OS running environment simulation and other technologies to design a detection tool for malicious mobile codes. Different from traditional antivirus software, this tool directly checked and identiﬁed common malicious behaviors such as spamming, self-replication, registry modiﬁcation and nonuse of virus signatures. Relevant security strategies were deﬁned for those behaviors prohibited by the system at the API function calling level. To make a system analysis on the data ﬂow diagram of the program and to track registry data and memory variables can make analysts easily understand the relevant behaviors of the program. At present, this software mainly aims at Win 32 PE format codes under the system architecture of Intel IA32, and can detect most of the currently popular email viruses. Hung-min Sun [12] noticed that Win32 API calling was needed when PE format malicious codes interacted with the Windows operating system, and further analyzed the mechanism of API function calling by malicious codes. Hook technology was also used to design a method to monitor API calling. Once abnormal API calling was found, the program could be judged to have malicious behaviors, and these malicious operations could then be halted in time. But this method was not intelligent, and was difﬁcult to ﬁght against obfuscated malicious codes. Preda et al. [13] found that the traditional syntax-based malicious code detection method (such as the feature code matching method) was subject to the attacks from the code obfuscation technique, and proposed a new malicious code detection method based on the framework of formal semantics which was used to ﬁght against multiple commonly used obfuscation methods in viruses. The author formally proved the reliability and completeness of this system, and took the semantics-aware detector which was proposed by Christodorescu et al. [14] as an example to show the feasibility of this method. Hyungjoon Lee [15] discussed the applications of biological immunology in virus detection, and mainly studied negative selection, matcher generation and other algorithms. Due to high complexity of these algorithms, the requirement of real-time detection could hardly be met. In this paper, taken the statistical learning theory as the guide, automatic virus detection technique has been studied, and a novel ensemble method integrated with dynamic virus detection and static detection based on D–S theory of evidence has been put forward. The detection system uses support vector machine (SVM) as member classiﬁer for viruses' dynamic behavior modeling, and also uses probabilistic neural network (NN) as member classiﬁer for static behavior modeling [16,30]. At last, heterogeneous classiﬁers are combined with the use of D–S theory of evidence, which improves the accuracy of the ensemble virus detector. The experimental results show that this approach has a good detection effect on unknown and metamorphic viruses. The structure of the paper is as follows: Section 2 introduces the foundation of D–S theory of

evidence, Section 3 elaborates on the virus detection engine, Section 4 shows the experimental results and analysis, then Section 5 concludes the results of the present method.

2. Background of D–S theory of evidence D–S theory of evidence was put forward by Dempster in 1967 [17], and later Shafer developed and sorted it into a complete set of mathematical reasoning theory [18]. D–S theory of evidence can be regarded as the generalization expansion of the classical probabilistic reasoning theory in the ﬁnite ﬁeld, which is mainly characterized by its support for different levels of accuracy and direct introduction of unknown uncertainty. This theory can support probabilistic reasoning, diagnosis, risk analysis and decision support, and also gets speciﬁc applications in multi-sensor network and information security. D–S theory of evidence is based on non-empty ﬁnite ﬁeld Θ that is known as Frame of Discernment, and is made up of some mutually exclusive and exhaustive elements. Any possible assumption corresponds to a subset of Θ, and Θ's power set contains all possible assumptions. As the bottom concept of D–S theory of evidence, ﬁrst there is a need to deﬁne a probabilistic function that some evidence supports a system state, known as Basic Probability Assignment (BPA) function. Deﬁnition 1. If a function m : 2Θ -½0; 1 satisﬁes mð∅Þ ¼ 0 and ∑A D Θ mðAÞ ¼ 1, then m is known as the probability assignment function on 2Θ and mðAÞ is known as A's basic probability, where ∅ is a Null set. Deﬁnition 2. If a function bel : 2Θ -½0; 1 satisﬁes belðAÞ ¼ ∑B D A mðBÞ 8 A A Θ, belð∅Þ ¼ 0, and belðΘÞ ¼ 1, then bel is known as the belief function. The combination rule for multiple evidences is also proposed in D–S theory of evidence, namely Dempster rule. Deﬁnition 3. Dempster rule. If m1 and m2 are probability assignment functions of two evidences, then the probability assignment function of the combined evidence is 8 < k ∑ m1 ðBÞm2 ðCÞ A a ∅ B\C ¼ A mðAÞ ¼ m1 ðAÞ m2 ðAÞ ¼ ð1Þ :0 A¼∅ where k is a normalization factor, and k

1

¼ 1

∑

B\C ¼ ∅

m1 ðBÞm2 ðCÞ ¼

∑

B\C a∅

m1 ðBÞm2 ðCÞ

ð2Þ

For multiple probability assignment functions, if they can be combined, they are then integrated into a probability assignment function through the following operation: mðAÞ ¼ m1 m2 ::: mn ðAÞ ¼ K 1 where K ¼

∑

∑

∏ mi ðAi Þ

ð3Þ

\ Ai ¼ A1 r i r n

∏ mi ðAi Þ

\ Ai a ∅1 o i o n

3. Virus detection engine Through the combination of multiple classiﬁers, rather than the selection of the best classiﬁer, the performance can be improved. Different classiﬁers may give complementary information, which can prove that the combination of this kind of classiﬁers is beneﬁcial [19]. In 1990, Hansen and Salamon [20] studied the combination performance of multiple neural networks. It was proved that the ensemble of a group of neural networks had better performance than a single best member neural network. So they creatively put forward a method of neural network ensemble. Afterwards, many

Please cite this article as: B.-y. Zhang, et al., Research on virus detection technique based on ensemble neural network and SVM, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2013.04.055i

B.-y. Zhang et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

scholars made a lot of studies on the ensemble of SVMs which aimed to get better generalization ability for the ensemble SVM than a single SVM [21,22]. They then obtained satisfactory results. In the virus detection system proposed in this paper, multiple SVMs are used as member classiﬁers for viruses' dynamic behavior modeling, and multiple probabilistic neural networks are also used as member classiﬁers for static behavior modeling. At last, D–S theory of evidence is used to combine the detection results from all member classiﬁers, so as to gain the ﬁnal detection conclusion. 3.1. System framework The virus detection system framework based on D–S theory of evidence is shown in Fig. 1. The system takes the program's dynamic behavior characteristics and static characteristics into account, and extracts two classes of characteristic vectors to represent the pattern of the sample program. One is API function used by the program, and the other is n-gram information extracted in a static way from Portable Execute program. The detection engine monitors and also analyzes the program's behaviors, which can effectively detect unknown viruses and various polymorphic viruses. In the static analysis process, the method of probability and statistics is adopted to discover implicit information from n-gram, which can be used to detect the automatic production machine, compiler and programming environment in virus compilation, or even some programming habits of the program author, and can also effectively prevent counterattacks from the virus author. Member classiﬁers in this system include Probabilistic Neural Network (NN) and SVM. Bagging algorithm is used to generate member classiﬁers involved in ensemble, and the combination method of these member classiﬁers is based on Dempster–Shafer theory of evidence. In the training process of all member classiﬁers, the characteristic quantity input to NN is API function calling information, and that input to SVM is the program's n-gram information, which increases the difference and irrelevance between classiﬁers. 3.2. Feature selection

3

combining each 2-byte sequence into a single term. For instance, for the byte sequence B8 00 FF 3B 12, the corresponding n-grams would be B800, 00FF, FF3B, 3B12. The static feature selection from the sample is based on the implementation of information entropy. Information gain reﬂects the importance of a feature in classiﬁcation, which can be used as a basis for feature selection. We ﬁrst calculate the information gain value of each feature, and then sort these features according to the value of information gain. The top t features are selected to form a feature domain. Then, the feature of a smaller information gain value is neglected so as to reduce the dimension of features space. According to the deﬁnition of information theory, the entropy of the variable X is HðXÞ ¼ ∑ Pðxi Þ log 2 ðPðxi ÞÞ

ð4Þ

i

The conditional entropy of the variable X in relation to the variable Y is HðXjYÞ ¼ ∑ Pðyj Þ∑ Pðxi jyj Þlog 2 ðPðxi jyj ÞÞ j

ð5Þ

i

where Pðyi Þ denotes the prior probability of all components of the variable Y, and Pðxi jyj Þ is the posterior probability of xi in relation to yj . The information gain is deﬁned as IGðXjYÞ ¼ HðXÞ HðXjYÞ. It shows that part of X's information will be mapped due to the emergence of Y, which can lead to the decline of X's information entropy. In this paper, the calculation formula of information entropy is HðXÞ ¼ ½Pðx is normalÞ log 2 Pðx is normalÞ þ Pðx is abnormalÞ log 2 Pðx is abnormalÞ

ð6Þ

We can obtain from the calculation of the conditional entropy HðXjyi Þ by the following equation: HðXjyi Þ ¼ Pðyi ¼ 0Þ ½Pðx is normaljyi ¼ 0Þ log 2 Pðx is normaljyi ¼ 0Þ

þ Pðx is abnormaljyi ¼ 0Þ log 2 Pðx is abnormaljyi ¼ 0Þ

3.2.1. Static feature selection based on information gain The substrings divided from bit streams of the continuous binary data in an executable program by the ﬁxed length n are known as n-gram. In this paper, n-gram analysis is applied to the detection of Windows PE format program ﬁles. We used the UltraEdit utility to convert each executable to hexadecimal codes in an ascii format. We then produced n-grams (for n ¼ 2), by

Pðyi ¼ 1Þ ½Pðx is normaljyi ¼ 1Þ log 2 Pðx is normaljyi ¼ 1Þ

þ Pðx is abnormaljyi ¼ 1Þ log 2 Pðx is abnormaljyi ¼ 1Þ: ð7Þ where yi ¼ 0 represents that the feature yi does not emerge in the sample, and yi ¼ 1 represents that yi emerges in the sample. In our experiment we selected the top 200 n-grams as classiﬁcation feature.

Fig. 1. Virus detection system framework.

Please cite this article as: B.-y. Zhang, et al., Research on virus detection technique based on ensemble neural network and SVM, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2013.04.055i

B.-y. Zhang et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

4

3.2.2. Dynamic feature selection based on API function calling The automatic detection technique for Win32 PE format viruses is mainly studied in this paper. We take the API function called in loaded dynamic link library (DLL) by the program as the classiﬁcation feature of the program to be detected. After API function calling and tracking for the program in the sample library, a lot of system callings can be obtained. These different API functions have different roles in virus recognition. When an API function calling appears in virus ﬁles in a higher frequency, but in normal program ﬁles in a lower frequency, it makes a greater contribution to virus recognition. Therefore, this kind of API needs to be extracted to constitute a feature set. Here, in our experiments we adopt the mean-square deviation method of probability statistics. The mean-square deviation of the frequency between the classes can well reﬂect the contribution of all different API function callings. A called API function is proportional to the mean-square deviation of its frequency between the classes in classiﬁcation. The calculation process is as follows: Step 1. Perform system calling and tracking on all samples in the training library. We can obtain the called API sequence A ¼ fa1 ; a2 ; :::; am g. Calculate f ðavij Þ that is the frequency of each API function (namely ai ) appearing in each virus program vj , as well as f ðanij Þ that is the frequency in each normal program nj . Step 2. Calculate the mean value of the frequencies of each called API function ai appearing in the virus program and normal program ﬁles. Namely Dðf ðavi ÞÞ ¼ 1=s:∑sj ¼ 1 f ðavij Þ, where s is the number of virus samples. Dðf ðani ÞÞ ¼ 1=l:∑lj ¼ 1 f ðanij Þ, and l is the number of normal programs. Step 3. Calculate the total mean value of frequencies of each API function ai in the dataset, i.e., Dðf ðai ÞÞ ¼ ðDðf ðavi Þ þ Dðf ðani ÞÞ=2. Step 4. Calculate the mean-square deviation of the frequency between the classes for each API function ai , namely qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ Mðai Þ ¼ ðDðf ðai Þ Dðf ðavi ÞÞ2 þ ðDðf ðai Þ Dðf ðani ÞÞ2 . Step 5. Sort all API functions by the value of the mean-square deviation M, and select the top t features to form a feature set.

3.3. Generation of member classiﬁers The commonly used generation methods for an individual member classiﬁer mainly include Boosting and Bagging. The main idea of Bagging [23] is repeated sampling, so that an individual member classiﬁer can be trained through the random extraction of several examples from the original training set. The difference between individual classiﬁers is increased through repeated selection of training sets, so that their generalization abilities are

improved. In Boosting, the training set of each individual member classiﬁer depends on the performance of the classiﬁer previously produced. The example which is misjudged by the existing classiﬁer will appear in the training set of a new classiﬁer in a greater frequency. From the perspective of time cost in the detection system, we choose Bagging to generate individual member classiﬁers, for these classiﬁers generated from this method can be trained in parallel. The speciﬁc procedures are as follows: Given a training set S, a series of training subsets S1 ; S2 ; :::; ST can be obtained by repeated sampling. Then, the information gain algorithm is used in each training subset to sort out the static attributes which play an important role in classiﬁcation and is input to NN to turn out the individual member NN. Meanwhile, the dynamic attributes which also play an important role in classiﬁcation is sorted out and input into SVM to construct the individual member SVM classiﬁer. At last, on the basis of the selected individual member classiﬁers, we applied the strategy of D–S theory of evidence to integrate these member classiﬁers, for fusion and further make a comparison with the results from the voting method. Pseudocodes of the algorithm are shown in Table 1. 3.4. Member classiﬁer combination based on D–S theory of evidence 3.4.1. Probability assignment method based on the recognition performance of classiﬁers In the theory of evidence, the probability belief function is the basic premise of reasoning. For the combination system of multiple classiﬁers, the classiﬁcation performance of each member classiﬁer can be taken as the basic belief function value of each member. Xu et al. [24] put forward a probability assignment method, which used recognition rate, error rate and rejection rate to measure the belief of each member classiﬁer and was applied to the identiﬁcation of handwriting fonts. The experimental results showed that this method was very robust, and its performance was better than the voting method. We attempt to apply D–S theory of evidence to virus detection. First, we make an introduction to the method in the literature [24], and then make improvements in its deﬁciency. For N classiﬁers eð1Þ ; eð2Þ ; :::; eðNÞ which are used for K classiﬁcations, and each class is expressed as θk ; k ¼ 1; 2; :::; K. For convenience, rejection of all classiﬁers is deﬁned as the (K þ1)th class θK þ 1 . For the classiﬁer en , symbols of εnr ,εns and εnrej are used to represent its recognition rate, error rate and rejection rate respectively. Therefore, for a test pattern x, if it is divided into θk classes

Table 1 D–S Bagging algorithm. A: Training process Input: The training set S ¼ fðx1 ; y1 Þ; ðx2 ; y2 Þ; :::; ðxm ; ym Þg, of which xi A X, and yi A Y ¼ f1; 2; :::; kg ¼ K; learning machines C NN and C SV M ; the iteration number T; the size d for each Bag. 1. W IG ¼ IG_NgramðSt Þ // Obtain static features of the sample 2. W API ¼ MSE_APIðSt Þ // Obtain dynamic features of the sample 3. f or t ¼ 1; 2; :::; T{ 4. St ’Randomly retake d samples f rom S; 5.

NN C NN ðSt ; W IG Þ; t ’C

// Use St to learn C NN // Train NN-based member classiﬁers

6.

M C SV ’C SV M ðSt ; W API Þ; t

// Use St to learn C SV M // Train SVM-based member classiﬁers

} SV M Output:C NN . t ,C t

B: Test process Input: Test sample x. Output: EðxÞ ¼ arg maxfbelðθi Þj 8 i; belðθi Þ r αg iAK

//Use D–S theory of evidence for fusion

Please cite this article as: B.-y. Zhang, et al., Research on virus detection technique based on ensemble neural network and SVM, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2013.04.055i

B.-y. Zhang et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

by the classiﬁer eðnÞ , the value assignment of its basic probability function is deﬁned as 8 m ðθ Þ ¼ εnr > < n k mn ð:θk Þ ¼ εns ð8Þ > : mn ðΘÞ ¼ εn rej

Therefore, for the test sample x, each member classiﬁer can obtain a group of basic probability functions: m1 ð : Þ, m2 ð : Þ, …,mN ð : Þ, a total of N groups. Then, Dempster combination rule can be used to combine these BPAs: mð : Þ ¼ m1 ð : Þ m2 ð : Þ ::: mN ð : Þ

At last, the decision rule of the combined classiﬁer is EðxÞ ¼ max ðbelðθj ÞÞ j

At last, we can determine the classiﬁcation method of the ensemble classiﬁer E: EðxÞ ¼ θj ;

ð11Þ

The decision rules of the combined classiﬁer in some special cases are as follows: (1) If all member classiﬁers reject the pattern x, the ensemble classiﬁer also rejects this pattern, namely EðxÞ ¼ K þ 1. (2) If M (M oN) classiﬁers reject the pattern x, they do not participate in ensemble decision making, namely only N–M member classiﬁers participate in the ﬁnal classiﬁcation decision making. (3) If the recognition rate of a classiﬁer is 100%, namely εnr ¼ 1. It indicates that this classiﬁer always makes correct decisions, so the results of other classiﬁers can be ignored. (4) If the error rate of a classiﬁer is 100%, namely εns ¼ 1. It indicates that this classiﬁer always makes wrong decisions, so its classiﬁcation results can be ignored. However, in the literature [24], the fact that a member classiﬁer has different classiﬁcation performance for different classes has not been considered, so its probability assignment method is not the best, so as to affect the ultimate effects of ensemble classiﬁcation. We have made improvements on it. There are merely two classes of computer virus detection data, namely normal program and virus program, so the following recognition framework can be constructed under D–S theory of evidence: Θ ¼ fθ1 ; :θ1 ; θ2 ; :θ2 g ¼ f N; :N; A; :A g. In the formula, N represents normal ﬁle set, and A represents viruses set, N \ A ¼ ∅, where ∅ is a Null set. The basic belief function is deﬁned as 8 fN;:N;A;:Ag -½0; 1 > : mðfN; :N; A; :AgÞ ¼ 1 mðNÞ mð:NÞ mðAÞ mð:AÞ

if belðθj Þ ¼ arg maxðbelðθi ÞÞ

ð15Þ

iAK

The above formula does not contain θi 's relevant information, namely the uncertainty caused from “don't know”, but they can provide useful information in the ﬁnal decision making. Thus, the above formula can be modiﬁed as EðxÞ ¼ θj ;

if

belðθj Þ ¼ arg maxfbelðθi Þj 8 i; belðθi Þ r αg

ð16Þ

iAK

ð9Þ

Based on the combined BPA, the corresponding belief function value of each class can be calculated: ) belðθj Þ ; j ¼ 1; 2; :::; K ð10Þ belð:θj Þ

5

where 0 oα o 1. It aims to reach a recognition rate as high as possible under the constraint of low error rate. 3.4.2. Probability assignment method based on distance measure between classes The probability assignment method that has been discussed in Section 3.4.1 is based on the classiﬁcation performance of a classiﬁer, which is associated with the selection of a validation set, so this method is not so stable. As we know, whichever type the classiﬁer or whatever its working principle is, in actual modeling, we should do our best to expand the distance between classes. Namely, for a classiﬁer, if it has stronger separability, it has better classiﬁcation results. Therefore, the distance between classes can be chosen as the evidence probability assignment basis for each member classiﬁer [25,26]. N classiﬁers are still considered to be applied to K classiﬁcation problem, and each class is expressed as θk (k ¼ 1; 2; :::; K). The recognition framework under D–S theory of evidence is Θ ¼ fθ1 ; θ2 ; :::; θK g. If X k is the training sample matrix of the class θk (k ¼ 1; 2; :::; K), the feature matrix extracted from the feature selection modules of different classiﬁers is recorded as X kðnÞ . Classiﬁers can be homogeneous or heterogeneous, and different classiﬁers have different features space. To state conveniently, for a classiﬁer, its sample expression in different features spaces can be abstracted to be a modeling function, denoted as Γ ðnÞ ðX k Þ ¼ Ι kðnÞ ;

k ¼ 1; 2; :::; K;

n ¼ 1; 2; :::; N

ð17Þ

Evidently, based on different principles and methods of classiﬁers, their modeling functions are different. For a test sample x, the modeling of each member classiﬁer can be expressed as Γ ðnÞ ðxÞ ¼ ϒ ðnÞ ;

n ¼ 1; 2; :::; N

ð18Þ

Based on different expressions of a training sample and a test sample, each classiﬁer eðnÞ can calculate the distance between the test sample and different classes of training samples, which is then normalized and denoted as DistanceðnÞ ðΙ kðnÞ ; ϒ ðnÞ Þ;

k ¼ 1; 2; :::; K

ð19Þ

For a given test sample x, its basic belief function value in relation to a member classiﬁer eðiÞ is calculated as follows: 8 mn ðNÞ ¼ TP n rate=2 > > > > < mn ð:NÞ ¼ FP n rate=2 ð13Þ > m ðAÞ ¼ TN n rate=2 > > n > : mn ð:AÞ ¼ FNn rate=2

For any classiﬁer eðnÞ , K distance values between classes can be obtained and denoted as

In the formula, TP n , FP n , TN n and FN n are True Positive, False Positive, True Negative and False Negative. Then, BPAs of all member classiﬁers are combined based on Dempster rule:

mðnÞ ðθj Þ ¼ log sigðvariance½d Þ

m ¼ með1Þ með2Þ ::: meðNÞ

ð14Þ

d

ðnÞ

ðnÞ

¼ ½d1

ðnÞ

ðnÞ

d2 :::dK T

ð20Þ

If the classiﬁcation result of the classiﬁer eðnÞ for the test sample x is θj , its BPA can be calculated based on the following formula: ðnÞ

ð21Þ

ðnÞ

where variance½d denotes the distance variance between classes, and log sigð Þ is an increasing S-shaped function which maps variable values to the range [0,1]. A simple logistic function

Please cite this article as: B.-y. Zhang, et al., Research on virus detection technique based on ensemble neural network and SVM, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2013.04.055i

B.-y. Zhang et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

6

Prove: 1. It is proved that in the case that the recognition framework has two mutually exclusive elements, and Dempster rule meets the associative law: m1:::n ðAÞ ¼ m1:::n 1 ðAÞ mn ðAÞ The proving process is as follows: m1:::n ðAÞ ¼

∑

\ i Ai ¼ A

∑

! m1 ðA1 Þm2 ðA2 Þ:::mn ðAnÞ !

∑

Fig. 2. S-shaped transfer function of Logistic Function.

\ i Ai ¼ A

þ

∑

\ i Ai ¼ A

þ

! m1 ðA1 Þm2 ðA2 Þ:::mn 1 ðAn 1 Þ mn ðAÞ=

∑

m1 ðA1 Þm2 ðA2 Þ:::mn 1 ðAn 1 Þ mn ðAÞ ! m1 ðA1 Þm2 ðA2 Þ:::mn 1 ðAn 1 Þ mn ðΘÞ

∑

m1 ðA1 Þm2 ðA2 Þ:::mn 1 ðAn 1 Þ mn ðNÞ

\ i Ai ¼ N

þ

! !

∑

m1 ðA1 Þm2 ðA2 Þ:::mn 1 ðAn 1 Þ mn ðΘÞ

∑

m1 ðA1 Þm2 ðA2 Þ:::mn 1 ðAn 1 Þ mn ðΘÞ

\ i Ai ¼ N

þ

!

∑

\ i Ai ¼ A

þ

!

∑

\ i Ai ¼ A

þ

m1 ðA1 Þm2 ðA2 Þ:::mn 1 ðAn 1 Þ mn ðAÞ

m1 ðA1 Þm2 ðA2 Þ:::mn 1 ðAn 1 Þ mn ðΘÞ

\ i Ai ¼ Θ

!

m1 ðA1 Þm2 ðA2 Þ:::mn ðAnÞ =

\ i Ai a ∅

¼

ð24Þ

\ i Ai ¼ Θ

!

¼ ðm1:::n 1 ðAÞK n 1 mn ðAÞ þ m1:::n 1 ðAÞK n 1 mn ðΘÞ þ m1:::n 1 ðΘÞK n 1 mn ðAÞÞ=ðm1:::n 1 ðAÞK n 1 mn ðAÞ þ m1:::n 1 ðAÞK n 1 mn ðΘÞ þ m1:::n 1 ðNÞK n 1 mn ðNÞ þ m1:::n 1 ðNÞK n 1 mn ðΘÞ þ m1:::n 1 ðΘÞK n 1 mn ðΘÞÞ

Fig. 3. The ﬂowchart of the probability assignment algorithm based on distance measure between classes.

¼ ðm1:::n 1 ðAÞmn ðAÞ þ m1:::n 1 ðAÞmn ðΘÞ þ m1:::n 1 ðΘÞmn ðAÞÞ=ðm1:::n 1 ðAÞmn ðAÞ þ m1:::n 1 ðAÞmn ðΘÞ þ m1:::n 1 ðNÞmn ðNÞ þ m1:::n 1 ðNÞmn ðΘÞ þ m1:::n 1 ðΘÞmn ðΘÞÞ ¼ m1:::n 1 ðAÞ mn ðAÞ

ð25Þ

2. Based on the above associative law, it can be proved from mathematical induction that may be deﬁned by the formula log sigðtÞ ¼ 1=ð1 þet Þ, as shown in Fig.2. Then, based on Dempster rule, all member classiﬁers' BPAs are combined: mðθk Þ ¼ mð1Þ mð2Þ ::: mðNÞ ðθk Þ

ð22Þ

Finally, the decision rule of the combined classiﬁer is θj ¼ arg maxðmðθk ÞÞ

m1:::n ðAÞ ¼ m1 ðAÞ m2 ðAÞ ::: mn ðAÞ

ð26Þ

The combination formula of two evidences can be obtained within constant time. Therefore, the calculation of combination belief of N member classiﬁers can be completed at the step of n 1 through the above formula, and the cost is OðnÞ. ■

ð23Þ

k

The ﬂowchart of this algorithm is shown in Fig. 3. As we know, the complexity of Dempster combination rule is Pcomplete [27,28], but a computing method with time cost for OðnÞ (n is the number of member classiﬁers) can be obtained in the case that Θ ¼ fN; Ag according to Barnett's studies. Theorem 1. In the case that Θ ¼ fN; Ag, and N \ A ¼ ∅, where ∅ is a null set, the time complexity that n evidences are combined based on Dempster combination rule is OðnÞ.

4. Experiments There are a total of 632 samples in the data set, of which there are 423 normal programs and 209 virus programs (see Appendix A). API calling information of each sample is monitored and recorded in advance in a virtual machine, and meanwhile n-gram information of each sample is extracted. Afterwards, features are selected. There are two kinds of member classiﬁers in an ensemble classiﬁer, i.e., SVM and NN. For SVM, the kernel function selected,

Please cite this article as: B.-y. Zhang, et al., Research on virus detection technique based on ensemble neural network and SVM, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2013.04.055i

B.-y. Zhang et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

7

is radial basis function (RBF), and misclassiﬁcation penalty factor C and kernel parameter γ are set to 2 and 0.015, respectively. In NN, there is only one adjustable parameter SPREAD which can be easily determined. The generation of individual members and the training process are seen in Table 1. In order to improve the performance, variables are normalized. During training the NN-based member classiﬁers, the data for each class is collected and n-grams with their normalized frequencies are counted. The L most frequent n-grams with their normalized frequencies represent a class proﬁle. Similar to NNs, when we train the SVM-based member classiﬁers, the API features of the sample are used. In the experiments, we have compared the static virus detection method, the dynamic detection method and the integrated detection method in their performance, and the comparison results are shown in Table 2. For visual analysis, ROC graphs of different detection methods have been drawn, as shown in Figs. 4–7. The following conclusions can be obtained from the experimental results: (1) Whatever ensemble method is used, the accuracy of the detection method which integrates dynamic virus detection and static detection is higher than that of a single detection

Fig. 5. ROC graph of the modiﬁed D–S ensemble detector with ensemble NN and SVM.

Table 2 Detection results of different ensemble classiﬁers. Ensemble method

Accuracy (%)

Area under ROC curve Features (AUC)

1. Distance ensemble based D–S 2. Classwise ensemble based D–S 3. Noneclasswise ensemble based D–S 4. Ensemble SVM based bagging 5. Ensemble NN based IG-bagging

98.73

0.993

98.58

0.991

97.31

0.964

96.52 96.99

0.921 0.914

API and n-gram API and n-gram API and n-gram API n-gram

Note: 1 – probability assignment based on distance measure, D–S ensemble; 2 – probability assignment based on recognition performance of a classiﬁer, which has distinguished a classiﬁer's different recognition performance for different classes of samples, D–S ensemble; 3 – probability assignment based on recognition performance of a classiﬁer, which has not distinguished the difference in a classiﬁer's recognition performance for different classes of samples, D–S ensemble; 4 – using the Bagging method for SVM ensemble; 5 – using the IG-Bagging method for NN ensemble.

Fig. 6. ROC graph of the D–S ensemble detector based on distance measure with ensemble NN and SVM.

Fig. 7. ROC graph of the virus detection system with different ensemble methods.

Fig. 4. ROC graph of the ensemble detector proposed in the literature [24] with ensemble NN and SVM.

method. The reason is that the characteristic quantities input to the ensemble classiﬁer are the program's API information and n-gram information, and they are irrelevant, which

Please cite this article as: B.-y. Zhang, et al., Research on virus detection technique based on ensemble neural network and SVM, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2013.04.055i

B.-y. Zhang et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

8

Table 3. It can be seen from the table that various commercial antivirus tools have higher detection accuracy for common viruses, but lower accuracy for metamorphic codes. With the method in

Table 3 Detection results of different detectors for metamorphic viruses. Virus detectors

Blaster Blaster v1 Blaster v2 Beagle Beagle v1 Beagle v2 Funlove Funlove v1 Funlove v2

1

2

3

4

5

6

√ √ √ √ √ √ √

√ √ √ √ √ √ √

√ √ √ √ √ √

√ √ √ √

√ √ √

√ √ √ √ √ √ √ √ √

Note: 1 – Norton Antivirus; 2 – Kaspersky Antivirus; 3 – McAfee Antivirus; 4 – Rising Antivirus; 5 – Detection method in Literature [4]; 6 – Detection method in this paper.

increases the difference between classiﬁers to a maximum extent, and thus improves the performance of the detection system. (2) To verify the performance difference between the D–S Bagging method (called Classwise Ensemble) proposed in this article and the ensemble method (called Noneclasswise Ensemble) in the literature [24], we evaluate the two algorithms, and the experimental results are shown in Fig. 7. It can be seen from the ﬁgure that the area under ROC curve in Classwise Ensemble is greater than that in Noneclasswise Ensemble, which indicates that the proposed method is better than that in the literature [24]. The reason is that the fact that member classiﬁers have different classiﬁcation performance for different classes of samples has not been considered in the ensemble algorithm in the literature [24], and its probability assignment method is not the best, so that the ﬁnal results of ensemble classiﬁcation have been affected. (3) When D–S theory of evidence is used to combine member classiﬁers, we compare a method which takes the distance measure between classes as the probabilistic assignment basis (recorded as Distance Ensemble) with another method which takes the recognition performance of a classiﬁer as the probabilistic assignment basis in their classiﬁcation performance in the data set, and the results are shown in Fig. 7. It can be seen from the ﬁgure that their areas under ROC curve are roughly equal, which indicates that the two methods have equivalent performance. It shows that the evidence probability assignment method based on a distance measure between classes that is proposed in this article is feasible, which provides a new way of thinking for the combination of heterogeneous member classiﬁers. At last, we have tested the system's detection effects on metamorphic viruses. It is noted that the research object of the detection method proposed in the literature [4] is solely limited to common viruses. But the detection engine designed in this paper can effectively overcome the deﬁciency, due to the combination of the static analysis method and the dynamic analysis detection method based on program behavior monitoring, which can be used to ﬁght against encryption and metamorphic viruses. To verify detection effects, we select some popular Win32 PE format viruses: Blaster, Beagle and FunLove. The algorithm offered in the literature [29] is used to deform them, so as to obtain different metamorphic versions. For example, v1 is obtained by the application of inserting junk codes and modifying data segments, and v2 is obtained by modifying the program's control ﬂows. Then, we use several currently popular antivirus tools and the proposed method for evaluation, and the results are shown in

Table A1 . Win32.Adson.1559 Win32.Aidlot Win32.Akez Win32.Alcaul.a Win32.Aldebaran.8365.a Win32.Aliser.7825 Win32.Alma.2414 Win32.Andras.7300 Win32.AOC.3649.a Win32.Apathy.5378 Win32.Apparition.a Win32.Arianne.1052 Win32.Aris Win32.Artelad.2173 Win32.Asorl.a Win32.Atav.1939 Win32.Atav.2073 Win32.Auryn.1155 Win32.utoWorm.3072 Win32.Awfull.3571 Win32.Bakaver.a Win32.Barum.1536 Win32.Bee Win32.Beef.2110 Win32.Belial.2609 Win32.Belial.a Win32.Belial.b Win32.Belod.a Win32.Bender.1363 Win32.Benny.3219.a Win32.Bika.1906 Win32.BingHe Win32.Blakan.2016 Win32.Emotion.a Win32.Enar Win32.Enerlam.a Win32.Eva.a Win32.Evar.3587 Win32.Evol.a Win32.Evul.8192.b Win32.Evyl.b Win32.Fighter.b Win32.Flechal Win32.Fosforo.a Win32.Freebid Win32.FunLove.4070 Win32.Gaybar Win32.gen Win32.Genu.c Win32.Ghost.1667 Win32.Ginra.3334 Win32.Ginseng Win32.Giri.4937.b Win32.Gloria.2928 Win32.Glyn Win32.Gobi.a Win32.Godog Win32.Halen.2593 Win32.Haless.1127 Win32.Hatred.a Win32.Henky.504 Win32.Heretic.1986 Win32.Hidrag Win32.Highway.b Win32.HIV.6386 Win32.HLL.Fugo Win32.HLLC.Winatch Win32.HLLO.52736 Win32.HLLP.Thembe Win32.HLLP.VB.a Win32.HLLP.Vedex.b

Win32.HLLW.Xelif Win32.Hortiga.4800 Win32.Htrip.a Win32.Htrip.c Win32.Idele.2108 Win32.Idyll.1468 Win32.IKX Win32.Insom.1972 Win32.Intar.1151 Win32.InvictusDLL.101.a Win32.InvictusDLL.b Win32.Ipamor.a Win32.Jethro.5657 Win32.Kala.7620 Win32.Kanban.a Win32.Kaze.2056 Win32.Kenston.1895.a Win32.Ketan Win32.Kiltex Win32.Klinge Win32.KME Win32.KMKY Win32.Knight.2350 Win32.Koru Win32.Kriz.3660 Win32.Kuto.2058 Win32.Ladmar.2004 Win32.Lamchi.a Win32.Lamebyte Win32.Lamewin.1751 Win32.Lamicho.a Win32.Lanky.3153 Win32.Lash.a Win32.LazyMin.31 Win32.Legacy Win32.Levi.3236 Win32.Libertine.d Win32.Magic.1590 Win32.Matrix.817.a Win32.Mauz.a Win32.Mental Win32.Miam.5164 Win32.Milen.3205 Win32.Minit.b Win32.MircNew Win32.Mix.1852 Win32.Mockoder.1120 Win32.Mogul.6800 Win32.Mooder.g Win32.Mystery.2560 Win32.Nicolam Win32.Noise.410 Win32.Nox.2290 Win32.Opdoc.1204 Win32.Oporto.3076 Win32.Orez.6291 Win32.Oroch.5420 Win32.Padic Win32.Paradise.2168 Win32.Parite.b Win32.Parvo Win32.Peana Win32.Perrun.a Win32.PGPME Win32.Pilsen.4096 Win32.Positon.4668 Win32.Projet.2404.b Win32.Qozah.3361 Win32.RainSong.4266 Win32.Ramdile Win32.Razenya

Win32.Redart.2796 Win32.Redemption.a Win32.Refer.2939 Win32.Resur.a Win32.Revaz Win32.Rever Win32.Rhapsody.2619 Win32.Riccy.a Win32.Rivanon Win32.Ruff.4859 Win32.Rufoll.1432 Win32.Rutern.5244 Win32.Sadon.900 Win32.Sandman.4096 Win32.Sankei.1062 Win32.Santana.1104 Win32.Savior.1680 Win32.Score.3072.a Win32.Segax.1136 Win32.Sentinel.a Win32.Seppuku.2763 Win32.Shaitan.3550 Win32.Shan.1842 Win32.Shown.538 Win32.Silcer Win32.Slaman.i Win32.Slow.8192.a Win32.Small.1144 Win32.Spelac.1008 Win32.Spit.a Win32.Staro.1538 Win32.Stepar.b Win32.Stupid.a Win32.Taek.1275 Win32.Tapan.3882 Win32.This31.16896 Win32.Thorin.b Win32.Tinit.a Win32.Tirtas.5675 Win32.Tolone Win32.Tosep.1419 Win32.Trion.a Win32.Ultratt.8152 Win32.Undertaker.4883.a Win32.Vampiro.a Win32.VCell.3504 Win32.VChain Win32.Velost.1186 Win32.Viset.b Win32.Vorcan Win32.Vulcano Win32.Wabrex.a Win32.Wanhope.1834 Win32.Weird.c Win32.Wide.b Win32.Wit.a Win32.Wolf.b Win32.Xorala Win32.Yasw.924 Win32.Yerg.9412 Win32.Younga.2384.a Win32.Zawex.3196 Win32.ZHymn.a Win32.ZHymn.Host Win32.ZMist Win32.Zombie Win32.ZPerm.a

Please cite this article as: B.-y. Zhang, et al., Research on virus detection technique based on ensemble neural network and SVM, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2013.04.055i

B.-y. Zhang et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

the literature [4], we basically cannot ﬁght against metamorphic viruses. Our detection method performs well and effective both to common and metamorphic viruses, and also has the highest detection accuracy. 5. Conclusion As we know, both the dynamic detection and static detection technologies have their own advantages and disadvantages, so an integrated method of the two detection techniques has been proposed. In the detection system, member classiﬁers are heterogeneous essentially, and are not suitable for combination with the traditional voting method, so we use D–S theory of evidence for fusion of the results of different classiﬁers. The integrated detector has used different classes of characteristic quantities when training member classiﬁers, which can increase the irrelevance and difference between member classiﬁers to a great extent. During the combination of member classiﬁers, the property that a classiﬁer has different classiﬁcation performance for different classes of samples has been taken into account, so that the detection accuracy of the ensemble classiﬁer has been greatly increased. Meanwhile, we note that for a classiﬁer, during actual modeling, we should do our best to expand the distance between classes to improve the diversity of member classiﬁers. The stronger the separability, the better its classiﬁcation results are. In this article we have also studied the evidence probability assignment method based on a distance measure between classes, which reasonably reﬂects the inﬂuence of each member classiﬁer on the ﬁnal decision. The experiment shows that this method has good performance. Although it is known that the complexity of Dempster combination rule is P-complete, in this research environment, we prove that it is possible to obtain a computing method with linear time complexity OðNÞ. Therefore, the virus detection method proposed in this article meets the requirement of high performance. The system cost mainly lies in the feature extraction process. In actual applications, we can make ofﬂine preprocessing so as to save the running time. Furthermore, we have evaluated the detection effects of the ensemble classiﬁer on metamorphic viruses. The experiment also shows that the virus detection method has overcome the deﬁciency in traditional virus scanner, and has good detection effects on unknown and metamorphic viruses, whose performance is better than those popular commercial antivirus tools.

Acknowledgments The authors thank the anonymous reviewers for numerous suggestions that signiﬁcantly improved the quality of the paper. This work was partly supported by the National Natural Science Foundation of China (Grant nos. 60970034, 61170287, 61232016 and 60973153), the Planned Science and Technology Project of Hunan Province (Grant nos. 2011GK3084, 2012GK3044, and 2013GK2013), the Hunan Provincial Social Science Fund Project (Grant no. 12YBB090) and the Provincial National Natural Science Foundation of Hunan (Grant no. 13JJ3125).

Appendix A. Virus samples in the data set A total of 209 samples, viruses were detected by Kaspersky Anti-Virus On-Demand Scanner See Table A1.

9

References [1] G. Tesauro, J. Kephart, G. Sorkin, Neural networks for computer virus recognition, IEEE Expert 11 (1996) 5–6. [2] W. Arnold, G. Tesauro, Automatically generated Win32 heuristic virus detection, in: Proceedings of the 2000 International Virus Bulletin Conference, 2000. [3] T.A. Assaleh, N. Cercone, V. Keselj, R. Sweidan, Detection of new malicious code using N-grams signatures, in: Proceedings of the Second Annual Conference on Privacy, Security and Trust, 2004, pp. 193–196. [4] J.Z. Kolter, M.A. Maloof, Learning to detect malicious executables in the wild, in: Proceedings of the 10th ACM SIG KDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2004, pp. 470–478. [5] I. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufmann, San Francisco, 2000. [6] J. Bergeron, M. Debbabi, J. Desharnais, M.M. Erhioui, Y. Lavoie, N. Tawbi, Static detection of malicious code in executable programs, in: Symposium on Requirements Engineering for Information Security, 2001, pp. 1–8. [7] Akira Mori, Detecting unknown computer viruses – a new approach, Lect. Notes Comput. Sci. 3233 (2004) 226–241. [8] Victor Skormin, Alexander Volyn, Douglas Summerville, James Moronski, In the search of the “Gene of Self-Replication” in malicious codes, in: Proceedings of the 2005 IEEE Workshop on Information Assurance and Security. West Point, Los Alamitos, 2005, pp. 193–200. [9] Mihai Christodorescu, Somesh Jha, Testing Malware detectors, In: Proceedings of ISSTA'04, Boston, Massachusetts, vol. 7, 2004, pp. 34–44. [10] C. Jesse, R. Rabek, I. Khazan, M. Scott, L. Robert, K. Cunningham, Detection of Injected, Dynamically Generated, and Obfuscated Malicious Code. In: Proceedings of the 2003 ACM Workshop on Rapid Malcode, vol. 10, 2003, pp. 76–82. [11] Akira Mori, Tomonori Izumida, Toshimi Sawada,Tadashi Inoue, A tool for analyzing and detecting malicious mobile code, in: Proceedings of the ICSE'06, Shanghai, China, 2006, pp. 831–834. [12] Hung-Min Sun, Yue-Hsun Lin, Ming-Fung Wu, API monitoring system for defeating worms and exploits in MS-windows system, in: L. Batten, R. SafaviNaini (Eds.), Lecture Notes in Computer Science, 4058, ACISP, 2006, pp. 159–170. [13] Mila Dalla Preda, Mihai Christodorescu Somesh Jha, Saumya Debray, A semantics-based approach to Malware detection, in: Proceedings of the POPL'07, Nice, France, January, 2007, pp. 377–388. [14] M. Christodorescu, S. Jha, S.A. Seshia, D. Song, R.E. Bryant, Semantics-aware malware detection, in: Proceedings of the 2005 IEEE Symposium on Security and Privacy (S&P'05), Oakland, CA, USA, 2005, pp. 32–46. [15] Hyungjoon Lee, Biologically inspired computer virus detection system, Lect. Notes Comput. Sci. 3141 (2004) 153–165. [16] L. Guo, D.S. Huang, Human face recognition based on radial basis probabilistic neural network, in: Proceedings of the International Joint Conference on Neural Networks (IJCNN), 2003, Portland, OR, pp. 2208–2211. [17] A.P. Dempster, On direct probabilities, J. R. Stat. Soc. Ser. 8 (25) (1963) 102–107. [18] G. Shafer, A Mathematical Theory of Evidence, Princeton University Press, New Jersey, 1976. [19] D.S. Huang, Systematic Theory of Neural Networks for Pattern Recognition, Publishing House of Electronic Industry of China, Beijing, 1996. [20] L.K. Hansen, P. Salamon, Neural network ensembles, IEEE Trans. Pattern Anal. Mach. Intell. 12 (10) (1990) 993–1001. [21] S. Dimitrios Frossyniotis, S. Andreas, A multi-SVM classiﬁcation system, in: Proceedings of the Second International Workshop on Multiple Classiﬁer Systems (MCS 2001), LNCS, vol. 2096, 2001, pp. 198–207. [22] H. Kim, S. Pang, H. Je, et al., Constructing support vector machine ensemble, Pattern Recognit. 36 (12) (2003) 2757–2767. [23] L. Breiman, Bagging predictors, Mach. Learn. 24 (2) (1996) 123–140. [24] L. Xu, A. Krzyzak, C. Suen, Methods of combining multiple classiﬁers and their applications to handwritten recognition, IEEE Trans. Syst. Man Cybern. 22 (3) (1992) 418–435. [25] D.S. Huang, Radial basis probabilistic neural networks: model and application, Int. J. Pattern Recognit. Artif. Intell. 13 (7) (1999) 1083–1101. [26] W. Utschick, P. Nachbar, C. Knobloch, et al., The evaluation of feature extraction criteria applied to neuralnetwork classiﬁers, in: Proceedings of the Third International Conference on Document Analysis and Recognition, (1) 1995, pp. 315–318. [27] P. Orponen, Dempster's rule of combination is #P-complete, Artif. Intell. 44 (1–2) (1990) 245–253. [28] J.A. Barnet, Computational methods for a mathematical theory of evidence, in: Proceeding 7th International Conference on Artiﬁcial Intelligence, 1981, pp. 868–875. [29] A. Sung, J. Xu, P..Chavez, S. Mukkamala, Static analyzer for vicious executables (SAVE), in: Proceedings of the 20th Annual Computer Security Applications Conference, IEEE Computer Society, Washington, DC, USA, 2004, pp. 326–334. [30] D.S. Huang, Ji-Xiang Du, A constructive hybrid structure optimization methodology for radial basis probabilistic neural networks, IEEE Trans. Neural Netw. 19 (12) (2008) 2099–2115.

Please cite this article as: B.-y. Zhang, et al., Research on virus detection technique based on ensemble neural network and SVM, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2013.04.055i

10

B.-y. Zhang et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎ Bo-yun Zhang received his B.S. degree in computer science from Hunan University in 2002 and Ph.D. degree from the National University of Defense Technology in 2007. He is currently a professor of computer science in Hunan Police Academy, Changsha, China. His research interests include pattern recognition and information security.

Shu-Lin Wang received his Ph.D. degree in Computer Science and Technology from the National University of Defense Technology, China, in 2008. He also received his M.Sc. degree in Computer Application from the National University of Defense Technology, China, in 1997, and he obtained his B.Sc. degree in Computer Application from China University of Geosciences in 1989. He is working in Hunan University as associate professor. His research interests include bioinformatics, software engineering, and complex system.

Jian-ping Yin received his M.S. degree and Ph.D. degree in Computer Science from the National University of Defense Technology, China, in 1986 and 1990, respectively. He is a full professor of Computer Science in the National University of Defense Technology. His research interests involve artiﬁcial intelligence, pattern recognition, algorithm design, and information security.

Xi-Ai Yan received his M.Sc. degree in Software Engineering from the Hunan University, China, in 2007, and obtained his B.Sc. degree in Physical Education from the Hunan Normal University, China, in 1996. Currently, he is working in Hunan Police Academy as associate professor. He has been a Ph.D. Candidate in Computer Science and Technology in Hunan University. His research interests include software engineering, faulttolerance computing and information ﬁltering.

Please cite this article as: B.-y. Zhang, et al., Research on virus detection technique based on ensemble neural network and SVM, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2013.04.055i

Research on virus detection technique based on ensemble neural network and SVM

Research on virus detection technique based on ensemble neural network and SVM

Recommend Documents