Dynamic VSA: a framework for malware detection based on register contents

Dynamic VSA: a framework for malware detection based on register contents

Engineering Applications of Artificial Intelligence 44 (2015) 111–122 Contents lists available at ScienceDirect Engineering Applications of Artificial...

2MB Sizes 0 Downloads 40 Views

Engineering Applications of Artificial Intelligence 44 (2015) 111–122

Contents lists available at ScienceDirect

Engineering Applications of Artificial Intelligence journal homepage: www.elsevier.com/locate/engappai

Dynamic VSA: a framework for malware detection based on register contents$ Mahboobe Ghiasi, Ashkan Sami n, Zahra Salehi CSE & IT Department, Shiraz University, Shiraz, Iran

art ic l e i nf o

a b s t r a c t

Article history: Received 17 November 2014 Received in revised form 4 May 2015 Accepted 20 May 2015 Available online 11 June 2015

The number of malware files increases every day because of existing obfuscation techniques. Researchers recently pursued dynamic analysis to extract runtime behavior of programs to detect new malware variants. A method is proposed to find similarities of run-time behaviors based on the assumption that binary behaviors affect registers values differently. The idea has been explored in static settings known as VSA, where run-time values were estimated statically. VSA is extended into a dynamic setting in this research where actual run-time values are used to approximate all the possible values. Due to large number of values obtained for each binary in every register at run-time, a small representative set, a.k.a. prototypes, is extracted. Unknown files are classified based on comparison to these prototypes only. Experimental results showed that proposed method outperformed commercial Anti-Virus applications on the dataset used and reached a classification accuracy of 95.9% with 4.5% false positive. List of execution traces and dataset can be found at: http://home.shirazu.ac.ir/  sami/malware. & 2015 Elsevier Ltd. All rights reserved.

Keywords: Malware detection API call Dynamic analysis CPU register values x86 Registers values

1. Introduction The malicious program is defined as software that contains malicious functionalities to do harm. It can intrude a system, tamper with the applications or change the OS configuration. Malicious software or malware include viruses, worms, trojans, rootkits etc. Unfortunately, in the last few years, the number of malware threats has grown exponentially (McAfee Labs, 2014; PandaLab, 2014; Symantec Corporation, 2014), and it seems the trend will continue in the future. According to the report that provided by (PandaLabs, 2014), more than 15 million unique malware variants are detected, on average per day more than 160,000 new specimens. (Symantec Corporation, 2014) announced a 91% increase in targeted attacks against governments, the services industry, manufacturing and finance and 62% increase in the number of breaches expose over 552 million identities. Hence traditional methods of malware detection are not adequate to distinguish malicious binaries. The significant growth in new malware threats is due to obfuscation tools and techniques. These tools produce many

☆ The preliminary result of this paper was published “DyVSoR: Dynamic Malware Detection Based on Extracting Patterns fromValue Sets of Registers” ISeCure Journal 2013. n Corresponding author. E-mail addresses: [email protected] (M. Ghiasi), [email protected] (A. Sami), [email protected] (Z. Salehi).

http://dx.doi.org/10.1016/j.engappai.2015.05.008 0952-1976/& 2015 Elsevier Ltd. All rights reserved.

different variants of the same malware family. Applying code transformation techniques cause the newly produced malware programs go undetectable by most current detection methods. Common malware obfuscation techniques are stealth, Encryption, Oligomorphism, Polymorphism and Metamorphism. A stealth malware conceals its performed action to the computer system. Those malware programs that use an encryption, hide their existence by encoding their malicious payload. An oligomorphic malware also encrypts its body to produce several variants of itself and changes its decoder function (Szor, 2005). A polymorphic malware carries out its encrypted payload in the same way as oligomorphic malware but contains multiple copies of descriptor function. A metamorphic malware replicates its program code and creates a completely different variant of the malicious code. New variants are created by reordering independent instructions, inserting junk code, inserting irrelevant instructions and replacing equivalent code with other instruction (Alzarooni, 2012). Basically malware detection consists of static analysis and dynamic analysis. Static analysis explores malware code without executing it while dynamic analysis relies on run-time values. Dynamic analysis also is used as a complement to improve malware detection. It can capture semantic behaviors of malware programs to detect different behaviors (Tan and Shar, 2012; Chandramohan and Tan, 2012; Cesare et al., 2013; Zeng et al., 2013) Static analysis is routinely used to detect infected programs. It explores all possible execution paths of programs without execution and provides information on the entire executable logic (Liu et al.,

112

M. Ghiasi et al. / Engineering Applications of Artificial Intelligence 44 (2015) 111–122

2013). Tracing all execution paths leads to low false positives. Static analysis consumes small amount of time and resource to provide information about program content. In static analysis binary code of program is usually disassembled into assembler instruction. In general, static analysis is a fast and safe approach with low false positives alarm (Leder et al., 2009; Comparetti et al., 2010). In contrast to the mentioned advantages, static analysis has some limitations. A small change in most modern malware source code makes a huge change in the binary code. Today malware writers use high-level programming language to write malware. In addition, malware authors use obfuscation tools to generate new variants of malware programs. Although the new versions are semantically equivalent to the original version, static analysis suffers to detect the new versions. Moreover, static analysis uses approximation as a standard and it causes a decrease in precision (Saxena, 2007). Anti-malware tools are applied to protect, identify and delete malicious software. Popular anti-malware methods that use static analysis techniques are based on malware's signatures. In these techniques, a known pattern of executable code information is extracted. If the extracted opcode pattern of a new binary and known signature in the database is similar, the new file is detected as a malware. Regularly the signature database should be updated with new malware signatures. Although, these techniques are popular and reliable but have some problems. If an obfuscated sample enters a system, anti-malware may not detect it if the database does not have any information of the unknown executable. The detection accuracy of system depends on the database. Most commercial anti-malware tools hold a huge amount of signatures in their database. Maintaining millions of signatures and delivering all updated signatures to users is very time and cost consuming (Abdulalla et al., 2010). Since static analysis suffers from obfuscation technology, dynamic analysis is introduced as a complement to static analysis. It is less sensitive to code obfuscation (Moser et al., 2007). In dynamic approaches, the considered executable file must run in a controlled environment called sandbox. The activities of the programs are monitored and logged. In the form of sandbox used, these activities are the list of APIs and their parameters and list of register values before and after calls. Dynamic analysis intercepts Win32 function calls, perform taint analysis and trace data dependencies between API calls and library functions (Li et al., 2010). Dynamic analysis executes a suspicious program to observe the actual information and exact behaviors (Tian, 2011). Dynamic analysis reveals process creation, file and registry manipulation precisely and involves real values of memory, registers and variables without the need of approximation or abstraction. Additionally, packing and evasion techniques cannot effect on malware behavior. Due to the mentioned characteristics of dynamic analysis, it outperforms static analysis (Tian, 2011). In contrast to advantages of dynamic detection methods, they have some drawbacks. Although dynamic analysis gives the actual information about the control and data flow, it suffers from the execution overhead. It cannot provide entire malicious behaviors because cannot explore all possible execution paths. The task of a program running in a controlled environment for a determined time is very resource/time consuming. Additionally, environmentaware malware programs may detect the controlled environment by methods such as checking files, registry keys or processes. They postpone their malicious intention or some other malware programs may be triggered under specific conditions (Bayer et al., 2009). For instance a botnet carries out its malicious intent when its botmaster sends commands through a command and control channel (Yahyazadeh and Abadi, 2012). Researchers have used malware API calls or instruction sequences to model the behavior and improve effectiveness and

accuracy of malware detection (Tian, 2011). For example, (Fredrikson et al., 2010) extracted system call graph of malware and benign to detect malware programs, (Bayer et al., 2009) extracted profiles of binary behaviors such as system calls, dependencies between system calls and network analyses, (Lanzi et al., 2010) monitored interactions of benign programs with OS resources such as files and registry. The focus of the work is on extracting registers values. For this purpose, run-time behavior recorded in a controlled environment and some important API calls from common DLLs are hooked. A similarity function to describe the pair-wise distance between each pair of malware or benign files using register value sets is used. In addition, search space is reduced drastically be introducing and computing prototypes that are representative subsets of all samples. Experiments show that this method is successful in identifying 95.9% samples of diverse dataset with 4.5% false positive. The rest of the paper is as follows; related works that used static and dynamic methods for identifying malware samples are explained in Section 2. The overview of our system is explained in the next section. Section 3 describes the proposed method based on Value Set Analysis and the process of the matching phase to obtain similarity the distance between all binary files. In Section 4, the collection procedure of the trace reports in the controlled environment, experimental setting and evaluations of the method are discussed. Finally, the conclusion and future works are described.

2. Related works In this section, some recent papers which use static and behavioral analysis approaches are explored. Some recent researches tend to use static analysis to distinguish clean files from malicious ones. (Faruki et al., 2012) statically captured the sequence of API calls and generated Control Flow Graph (CFG). They converted call graphs to API call-grams which is demonstration of API calls that perform a single operation. (Song and Touili, 2012) extracted CFG and a state function of binary by static analysis and perform Model-checking techniques to malware identification. Model-checking is time consuming and it is not a suitable real-time detector. (Macedo and Touili, 2013) constructed trees statically to find which data flows between functions are only malicious. The trees constructed with system function and parameters that pass to them as nodes. Edges specify the data flow between functions. They generated an automaton storing malicious behavior to check a new file about containing malicious subtrees. (Alam et al., 2014) presented a metamorphic malware detection framework using a subgraph isomorphism. They divide an assembly program into smaller functions and extracted an abstract of assembly instructions. They generated a set of CFGs corresponding to separate functions of a program. To detect an unknown malware they compared the CFGs of the program with the extracted CFGs of training dataset. (Baldangombo et al., 2013) used PE-Miner program to extract Portable Executable (PE) header information, DLL name and API function calls. Then selected subset of features with calculating frequencies of DLLs and APIs and PE header feature's Information Gain. Finally they applied classifiers technique to detect malicious files. Alazab et al. (2014) inspected opcode information of sources and calculated opcode frequency statistics of PE files. They applied two different features selection filter approach and wrapper heuristic approach to develop a hybrid model for malware detection system. (Leder et al., 2009; Shankarapani et al., 2011; Santos et al., 2013) presented a static approach to detect malware files by measuring existing similarity between programs. (Leder et al.,

M. Ghiasi et al. / Engineering Applications of Artificial Intelligence 44 (2015) 111–122

2009) measured the similarity between sets of values using Jaccard measure. Their approach identified the variants by tracing the use of consistent values throughout the samples. This attempt has led us to consider the contribution of malware detection based on registers values using Jaccard distance. (Shankarapani et al., 2011) extracted API sequences that appear frequently in a number of malicious and applied Cosine measure, extended Jaccard measure, and the Pearson Correlation measures that are the popular measures of similarity for sequences. Santos et al. (2013) represented executable files using opcodes, the opcode-sequences and their frequency of appearance. They measured the statistical dependence of the opcode frequency and the class of the file to compute the frequency of each opcode sequence. They used Cosine similarity function to calculate the similarity between an inspected executable within the dataset. As mentioned previously anti-antivirus techniques, such as polymorphism and metamorphism make malware detection difficult using static extraction alone. Therefore, dynamic analysis is needed as a complement for static techniques (Comparetti et al., 2010). Dynamic analysis is applied to overcome the limitations of signature-based detection. Some of researches used API calls and their related properties to model the behavior of samples through a graph. They built their graph in different ways and analyzed and compared graph using different methods. (Hu et al., 2009) constructed function call graphs from structural and instruction-level information in the underlying malware programs. They created a graph database to find the nearest neighbor of a specific malware graph. Fredrikson et al. (2010) collected data flow dependencies among system calls and represented behaviors as dependence graphs. They generated set of behavior graphs that describe the semantics exhibited by malicious and benign applications and used graph mining operation to extract significant behaviors that can be used to distinguish malware from benign applications. Elhadi et al. (2012) proposed a system that combines signaturebased with behavior-based using API calls and the operating system resources used by API call as graph nodes. The edges represent the reference between the nodes. They used the Graph Edit Distance (GED) for matching inexact graph type. Cesare et al. (2013) employed dynamic and static analysis to identify malware samples based on similarities of control flow graphs. They used dynamic analysis to reveal the hidden code of packed malware using entropy analysis. Afterward applied static analysis is used to identify string signatures for exact and approximate identification of flow graphs. Some above listed works may contain a huge number of nodes and edges that needs to be minimized. On the other hand comparing graphs to find the existing similarities between them, is time and space consuming because some of problems are NP-complete (Elhadi et al., 2012; Skaletsky et al., 2010; Macedo and Touili, 2013). In below recent dynamic works, researchers used data mining and machine learning techniques to classify the programs. Devesa et al. (2010) monitored detailed trace of the actions performed by binaries as behavior dynamically. They recorded registry, memory and file APIs and represented an executable as a vector that is composed by binary characteristics. The binary vector was passed to some classifier methods to detect malicious from benign binaries. Tian et al. (2010) run individual malware executables and collected various logs of API calls. They extracted features from log files based on distinct string frequency and generated the feature vectors as binary vectors. Qiao et al. (2013) assumed the frequent API call sequences, including API call's name and parameters play an important role in reflecting the behavior of malware. They applied malware clustering to calculate the similarity between different malware binaries. Chandramohan et al. (2013) considered the set of actions corresponding to each OS

113

resource such as file system, registry, process and network as a feature. They created a binary feature vector of all features and then used classification techniques to build malware detection models. (Xiao et al. (2013) modeled the behavior of files (APIs) as a frequent item set and improved Apriori algorithm to deal with large number of generated rules. Qiao et al. (2014) and Rieck et al. (2011) extracted API call sequences as each behavior of binary and converted them to byte-based sequential data. Afterward, they used classification and clustering techniques to find similarity between binaries. Salehi et al. (2014) assumed APIs cannot represent similar behavior of samples properly; therefore they extracted API calls and input arguments together as predictive features. They represented behavior as a binay vector of features and passed the vector to some classifier algorithms to classify samples. Some works used similarity measures between samples to detect malware files. Wagener et al. (2008) compared two malware behaviors with each other by observing all the system function calls, and leveraged the Hellinger distance to compute associated distances. Bayer et al. (2009) proposed an approach for clustering malware samples so that malware with similar behavior could be grouped together. They extracted profiles of binary behaviors such as system calls, dependencies between system calls and network analyses. They used the Jaccard Index to measure the similarities between executable files. Bayer et al. (2010) proposed a system to avoid analyzing mutated instances of malware binaries that already analyzed. They dynamically analyzed each malware for a short period of time to realize if it is a mutated instance of an already analyzed binary. Finally, they calculated a pair-wised similarity between all behavioral trace reports. Jang et al. (2011) presented a system that clustered malware based on Jaccard similarity distance. They believed that malware files that share the large features are more similar. They evaluated their system using two analysis methods, static and dynamic. First, in static analysis, two files are considered as similar if they use large same code fragments as features. In dynamic analysis, two files are considered similar if they represent similar behaviors. Hegedus et al. (2011) used random projection to decrease dimension of obtained features and employed Jaccard measure, Cosine measure and a modified K-Nearest neighbor classifier to predict whether an unknown executable is malicious or benign using behavioral data. The listed researchers used different datasets and may also take advantage of various sandboxes, therefore they recorded behavior of files in different formats and it is not possible to compare the results and to determine what is the best method under various conditions (Tahan et al., 2012). We compare our method with some popular anti-viruses on the same dataset to evaluate the proposed method. Table 1 presents results from some various researches.

3. System overview A system for the dynamic analysis of malware behavior using classification is proposed which its overview is shown in Fig. 1. Analysis steps are as follows: first, binaries are executed and monitored in a controlled environment. A report of API calls with registers values before and after invoking API calls is generated for each binary. The number of extracted registers values is large and considering all of them for analysis consume a lot of resources, so the number of features is reduced. Next, some samples that can represent the behavior of whole dataset files, a.k.a. prototypes, are extracted based on the values of registers. An unknown sample is classified as malware or benign based on the types of the nearest prototypes.

114

M. Ghiasi et al. / Engineering Applications of Artificial Intelligence 44 (2015) 111–122

Table 1 Comparison of several malware detection methods. Study

Analysis type

FP

Detection rate

Feature

Representation

Leder et al. (2009) Faruki et al. (2012) Alam et al. (2014) Alazab et al. (2014) Baldangombo et al. (2013) Fredrikson et al. (2010) Devesa et al. (2010)

Static Static Static Static Static Dynamic Dynamic

0% – 4.5% – 2.7% 0% 1.9%

100% 98.3% 98.9% 97% 99.6% 86% 94.8

Data flow tracking and approximate memory content API call sequence An abstract representation of an assembly program Opcode sequence PE header information, DLL names and API names System calls and data flow dependencies among them Registry, memory and file APIs

Tian et al. (2010)

Dynamic

2%

97.3%

API calls and their parameters as separate features

Cesare et al. (2013) Qiao et al. (2013) Chandramohan et al. (2013) Salehi et al. (2014)

Dynamic Dynamic Dynamic

– – 1%

88% 94.7% 99.6%

Dynamic

4.9% 97.6%

API call Frequent itemsets of API calls and API arguments Actions corresponding to each OS resource (File system, Registry, Process/ Thread) API calls and their arguments

String information N-gram Annotated CFG String information String information Dependence graph Binary vector of features Binary vector of features CFG N-gram Binary vector of features Binary vector of features

Fig. 1. Schematic overview of analysis framework.

3.1. PEs behavior monitoring Each binary file is executed in a controlled environment and run-time reports based on API system calls are recorded. A secure environment to run binaries is provided where VMWare V. 8.0.0471780 on host OS is installed. WinApi32 tool (Potier, 2013) is used to capture all invoked API calls of binary along with all arguments, registers content, API returned value, etc. Each file is executed for a specified time of two minutes or its execution is finished before two minutes. ‘Two minutes’ is selected because it is large enough for most malware to execute its immediate payload, if it has one (Fredrikson et al., 2010). A clean snapshot of controlled environment is captured before starting the process of dynamic analysis. After executing each file using the hooking tool, the trace report of the binary file is collected and VMWare is reverted to the captured snapshot. This process continues until all files are monitored. A sample output report is shown in Fig. 2 and the whole process of behavior monitoring is illustrated in Fig. 3. 3.2. Values of registers The main motivation behind the proposed system is that during program execution, instructions, memory contents and

register values may be used instead of estimation in static VSA. The system assumes that the functionality of binaries can be reflected by memory and register contents. Static VSA tries to specify values for all memory contents at each line of assembly code and it approximates all the values that an instruction operand may contain. The memory content includes registers contents, global memory, stack and heap allocations. The example of Fig. 4 describes the VSA idea more clearly. A similarity score between the two files FileA and FileB is calculated based on their memory contents. FileA and FileB are considered two variants of a polymorphic malware if they reach a specified threshold of similarity. Our proposed system does not have the complication of memory estimation in comparison to static VSA. Dynamic VSA uses the values to each memory location or register at the runtime. The collected dataset is used to approximate all the possible values. WinApi32 reports the registers before and after invoking API call for each binary but it is not able to capture stack and memory contents. Hence propagation and changes of the values of some registers are traced throughout the binaries executions. Only the values of a subset of those registers that relate to logic and arithmetic operations are extracted. The registers that move data or manipulate stack are not considered since those registers are

M. Ghiasi et al. / Engineering Applications of Artificial Intelligence 44 (2015) 111–122

115

trace reports is illustrated in Fig. 5. Matrix M is considered as a vector where all the values of EAX column in the matrix M are collected in the new vector as EAX vector. Fig. 6 shows the value vector of EAX register. 3.3. Method In this section, the proposed method is described. In Section 3.3.1, matching phase represents a method to compute the similarity between two binaries based on their registers value sets. Section 3.3.2 explains the process of extracting important values of registers to save time and cost for similarity computation between files. In the next part, prototypes are extracted from the dataset. Prototypes are a small set of dataset samples that represent the behaviors of the whole dataset (Rieck et al., 2011). Unknown files are just compared with these representative samples in order to find their label. This process is introduced in Section 3.3.4. 3.3.1. Matching phase In this section, the process of the matching phase for obtaining the similarity scores between all binary files is described. If FileB is considered as a different variant of FileA, their similarity exceeds a given threshold. To calculate the similarity score between behaviors of programs, the Jaccard similarity distance, defined in Eq. (1) is employed. ΔðA; BÞ ¼

Fig. 2. A portion of a file output trace report from our developed homemade tool.

Fig. 3. Running the sample files in the control environment.

mostly used to prepare the environment for a computation, and hence their values may be less representative of different behaviors. The proposed system converts the run-time trace report L, as shown in Fig. 2 to the matrix M. Each two rows of the matrix M belongs to one of the invoked API calls in the trace report L where values of registers before and after invoking each API calls is shown in two rows of the matrix M. The new representation of the

j FileA \ FileBj FileBj j FileA [

ð1Þ

In general, Jaccard similarity is a statistic measure used for comparing the similarity of samples. The numerator calculates the amount of shared features between the two samples, and the denominator is the size of the union of the sample set. The motivation for using this formula is that the degree of similarity is relative to the percentage of shared features, for example, malware files that share more common features are more similar than those that do not. To carry out the dynamic VSA on FileA and FileB, register value vectors of two files FileA and FileB should be obtained. Then the similarity scores between each registers of FileA and FileB based on EQ1 should be computed. This value ranges from 0 to 1, where 0 means that the two executables are completely different (i.e., the vectors are orthogonal to each other) and 1 means that the executables are equivalent. For Example, to calculate the similarity score between the value vector of EAXA and EAXB, first, two vectors VA and VB for the values of EAXA and EAXB are generated respectively. Then the Jaccard distance is used on VA and VB to compute similarity between them. The intersection between VA and VB specifies the count of common values between them and the union between VA and VB specifies the count of all unique values of the two vectors. The previous steps are repeated for other registers of FileA and FileB and the mean of similarity score between the registers of FileA and FileB is considered as the similarity score between the two files FileA and FileB. When an unknown file enters into the system, the value vectors of its registers are generated. Then the similarity score between the new file and other files in the dataset is calculated. The highest similarity is obtained, specifies the classification label of the test file. To classify a new file into malicious or benign label, the file should be compare with all dataset files. The number of necessary comparisons that should be done, has the order of |n| time complexity where ‘n’ is equal the dataset size. To deal with this problem, the prototype extraction method is proposed in Section 3.3.3.

116

M. Ghiasi et al. / Engineering Applications of Artificial Intelligence 44 (2015) 111–122

Fig. 4. VSA idea.

Fig. 5. The new representation of a trace report.

Table 2 Frequency table. Fig. 6. Value vector of EAX register.

3.3.2. Feature selection Polymorphism techniques like junk code insertion create a lot of features in the behavior monitoring phase. We want to extract important features because some features are just inserted into binaries to make them difficult to detect. A lot of features are not discriminative for malware detection; the preprocessing eliminates irrelevant data that was extracted from outputs of the API monitoring software. As mentioned in Section 3.2, each value of each register is considered as a feature. The observed number of registers values is large. Therefore, those values that are irrelevant and are less discriminative should be eliminated. In the first stage, we created a frequency table that illustrates the number of samples that had a specific feature in the behavioral monitoring process. Next those features with a frequency less than a specific threshold are removed. For example suppose, our dataset includes 5 log files and suppose we want to extract the ‘EAX’ features. For this purpose, all unique values of EAX are extracted from all malware and benign binaries of the dataset. A table similar to Table 2 will be generated to count the frequency of features. Suppose in this example, those features are selected if their frequency is higher than threshold Γv¼ 4 therefore f1, f2, f4 and f6 are selected as important features. For example: log files ¼{File1, File2, File3, File4, File5} File1EAX ¼ {f1, f2, f5, f6, f10, f9} File2EAX ¼ {f3, f4, f2, f6, f1, f7} File3EAX ¼ {f1, f5, f8, f4, f2, f3} File4EAX ¼ {f4, f4, f2, f3, f5, f6, f1} File5EAX ¼ {f3, f4, f6, f2, f3}

Feature name

Frequency

f1 f2 f3 f4 f5 f6 f7 f8 f9 f10

4 5 3 4 3 4 1 1 2 1

After omitting irrelevant features, the proposed system used Chi-Square test as a feature selection method to select a smaller set of features. This provided an efficient method which reduces the processing time. The Chi-Square testing procedures attempt to isolate the values that are most strongly associated with certain malware classes (Zheng et al., 2004; Sathyanarayan et al., 2008; Osaghae and Chiemeke, 2012). Chi-Square measures lack of independence between a term and a category and can be compared to the chi-square distribution with one degree of freedom to judge extremeness. The absence and presence of each feature is investigated in all dataset binaries. If the selected feature is available in the log file, the value of that feature is set to 1 otherwise it is set to 0 and a binary vector for the file is generated. The set of created binary vectors are then used as inputs to the Chi-Square testing procedure.

3.3.3. Prototype extraction Prototypes are small set of files as representative samples of whole dataset which provide an acceptable approximation in pair-

M. Ghiasi et al. / Engineering Applications of Artificial Intelligence 44 (2015) 111–122

wise distance analysis. When a new file enters into the system, it should be compared with just the prototypes instead of the whole tainting dataset. The necessary comparison of previous matching phase in Section 3.3.1 has a time complexity of order |n| but by selecting ‘K’ prototypes instead of all files reduces the number of comparisons to order of |K|. The algorithm of prototype extraction is shown in Fig. 7. According to Fig. 7, first a file is selected randomly and is added to the prototype set (Line 1 and 2). Those samples of the dataset should be selected and inserted into the prototype set which have farthest distance with samples of prototype set. Γp is the threshold that specifies the maximum similarity score between the two files. To find the farthest files of the dataset; the similarity score Δ should be less than the threshold Γp (Line 6). The lesser the similarity scores Δ between the two files, the farther from each other two files are. Lines 4 to 12 make sure that the selected file has a maximum distance with all samples in the prototypes set. This operation should be repeated for all log files of the dataset (Line 3). Therefore, some samples are selected that have greatest distance with respect to each other based on the value of threshold Γp. The number of selected representative samples depends on the value of threshold Γp and the variety of categories and families of files in the dataset. The algorithm of Fig. 7 has a time complexity order of (|n|n|K|) to generate a prototype set. Where ‘n’ specifies the length of the training dataset and ‘K’ specifies the length of the prototype set. To explore the whole dataset to find farthest files, we need a loop that traverses all log files with size of ‘n’. After selecting a file, another loop is required that checks if the selected file has necessary maximum distance with all prototypes. In the worst case this algorithm has the time complexity of ‘n2’. 3.3.4. Matching phase by prototype set The matching process using a prototype set is similar to the method presented in Section 3.3.1. Fig. 8 shows the process of the matching phase by prototype set. When an unknown file enters into the system, the highest similarity between this file and prototype samples is calculated. As mentioned the two files are similar if they achieve the minimum threshold of similarity. If the similarity Δ(prototype, UnknownSample) is at least equal to the threshold Γd, based on the category of ‘prototype’ (i.e. malware or benign) the binary is labeled and is stored in the labelArray. The labelArray is an array that stores the class label of prototypes that

are similar to the new file. The process is repeated until the distance between all prototypes and the unknown file are calculated and labelArray stores all predicted labels. Finally, for finding the real nature of the unknown file, a voting was performed on the labelArray values. The label of unseen file is classified as the voted label. If the labelArray contained an even length and the number of malware labels and benign labels turned out to be equal, a weighting voting method would be deployed. In the weighting voting method, the mean of similarity score of the same labels of two classes is used. Then each class label that gets higher value with respect to others is selected as the label of the new file. The run-time complexity of the necessary comparison is O(K) where K is the number of prototypes.

4. Evaluation In this section, the used dataset, the selection of DLLs for binaries and the measurement criteria are described. Also the experimental evaluations of the proposed method are given. 4.1. Dataset and experimental setting A dataset including files of malware and benign is used. The distribution of malware and benign PEs in the dataset is described in Section 4.1.1. Some configurations are performed in the hooking tool to omit unnecessary DLLs which is explained in Section 4.1.2. The evaluation measures are explained briefly in part 4.1.3. The performance of the proposed system by the evaluation measures are expressed in Section 4.2. 4.1.1. Dataset To evaluate the proposed method 850 malware programs and 390 benign files are used. Benign ones are programs installed under Windows XP program files folders, windows system files and a wide range of portable benign tools. Malicious samples include Constructor, Trojan, Virus, Backdoor, etc. The distribution of this dataset is illustrated in Fig. 9. A larger number of samples beside variant families of malware binaries guarantee that results obtained are more reliable. List of execution traces and dataset can be found at: http://home.shirazu.ac.ir/  sami/malware. 4.1.2. Essential API calls for detecting malicious behavior In this study, unnecessary DLLs are eliminated based on a previous work done by Karbalaee et al. (2012). The common DLLs are advapi32.dll, kernel32.dll, ntdll.dll, user32.dll, wininet.dll and ws2_32.dll. The importance of these DLLs are recognized for malware detection (Langerud, 2008), as described in Table 3.

Fig. 7. The process of prototype extraction.

Fig. 8. The matching phase using prototypes set.

117

Fig. 9. Our experimental dataset.

118

M. Ghiasi et al. / Engineering Applications of Artificial Intelligence 44 (2015) 111–122

We extracted the most discriminative API calls from these DLLs based on five categories: File System Access, Registry Access, System Information, Processes, and Networking (Wagener, 2006). Table 4 describes these categories briefly. Finally, 126 API calls of the six DLLs that are most discriminative to diagnostic malicious activity are selected (Ghiasi et al., 2013) as shown in Appendix A. 4.1.3. Evaluation measures Precision is the percentage of predicted malware cases that are actually malware. Recall is the percentage of actual malware cases detected as malware by the model. Precision and recall are defined as in Eqs. 2 and 3 respectively: Precision ¼ Recall ¼

TP TP þ FP

TP TP þ FN

ð2Þ ð3Þ

Where TP is the number of malicious files correctly classified, FP is the number of benign files incorrectly classified as malicious and FN is the number of malicious files incorrectly classified as benign. For malware detection from the large dataset, we must try to maximize the precision and recall of malware class. To this end, an aggregated performance score for our evaluation is considered, denoted as F-Measure which combines precision and recall. The FMeasure formula is defined as in Eq. 4. F  Measure ¼

2  Precision  Recall Precision þ Recall

ð4Þ

The F-Measure score can be interpreted as a weighted average of the precision and recall and a high precision or recall result in the high F-Measure. Accuracy is also used as a performance measure of how well the model correctly identifies malware samples. The formula of accuracy which is the proportion of true results (both true positives and true negatives) in the dataset is shown in Eq. 5. Accuracy ¼

TP þ TN TP þ FP þTN þ FN

ð5Þ

Accuracy is appropriate for data mining experiments with balanced datasets. Our dataset is imbalanced since about 30% of the dataset is benign. Weng and Poon (2008) showed that

precision, recall and F-Measure are good measures for imbalanced datasets (Weng and Poon, 2008). 4.2. Experiments and discussion The goal of this evaluation is to demonstrate that a robust system for identifying malware samples has been developed. The impact of various registers values in the overall result are explored in previous work (Ghiasi et al., 2013) and this work shows that our method can extract prototypes set that make the matching process more efficient while maintaining the accuracy. The 126 selected API calls of six important DLLs that record the binary behaviors, affected in the performance of experiments implicitly. The matching results of EAX register of the preliminary work (Ghiasi et al., 2013) showed that EAX register values are more discriminative in categorizing samples than other registers values with 96.1% detection rate and 4% false positive. EAX register is a 32-bit general purpose integer register and is the accumulator register. It is used for I/O port access, arithmetic operations, interrupt calls, logic, and data transfer, multiplication, and division, etc (Fog, 2012). Also the return value of a function call is placed in 4 bytes of EAX (Potier, 2011). That means, EAX values are sufficient in identifying malware samples and then the values could be an appropriate filter to reduce the values set of all other registers. EAX register values are used for extracting important values of other registers. According to the method presented in Section 3.3.2, some of the best values of registers are selected. A large number of experiments with different values of threshold Γv were performed to select important values of EAX register. The different threshold values Γv ranged from 0.5% to 30% with intervals of 0.5%. We want to find the case in which the number of features is acceptable and accuracy is decreased only slightly. These features should be discriminative in malware detection. After performing experiments, we selected those registers values that appear at least in 98% of dataset files. In all experiments, the important values of EAX register were selected with a threshold Γv equal to 2%. Afterward, those values of other registers are extracted that their corresponding EAX value is emerged in the selected invoked API call after removing irrelevant EAX features. In the next step, ChiSquare test was used to select a smaller set of features of each

Table 3 A summary of the selected DLLs. DLL name

Description

Kernel32. dll User32.dll

Performing Low-level operating system functions, such as memory management, input/output operations, process and thread creation, and synchronization functions (Langerud, 2008). Windows user component management such as creating and manipulating standard elements of the windows user interface, desktop, Windows, and menus (Langerud, 2008). Advapi32. Accessing to additional functionalities of the kernel such as the windows registry, system shutdown/restart (or abort), start/stop/create a windows service dll and managing user accounts (Langerud, 2008). Ntdll.dll Exporting the Windows Native API (Langerud, 2008). Wininet.dll Protocol handler for HTTP, HTTPS and FTP (Langerud, 2008). WS2_32. Contains Windows Sockets API used by most internet and network applications to handle network connections (Langerud, 2008). dll

Table 4 Description of additional APIs were selected to malware detection. Category File System Access Registry Access System Information Processes Networking

Description Malware program Malware program Malware program Malware program Malware program

tampers file system to perform its malicious operation. tampers registry to access passwords, execution at boot time, etc. insures system that has reasonable intent. uses thread to communicate with other process address space. performs malicious intention on the local network or internet

M. Ghiasi et al. / Engineering Applications of Artificial Intelligence 44 (2015) 111–122

vector. Chi-Square estimates the rank of features according to how well their values distinguish between samples. Hence, the new generated registers vectors include fewer elements, tha are more important. The processing speed is increased and the detection accuracy is also preserved. The next step extracts prototypes from the whole dataset. For this purpose, it is necessary to perform several experiments to find the best threshold Γp. The prototype extraction process used 10 fold cross validation. The dataset is divided into 10 folds. Each time, 9 folds are selected as training set and prototype samples are extracted from them. The process is repeated 10 times and several threshold Γp values are examined. Table 5 shows several Γp and the average numbers of prototypes extracted at each threshold Γp of 10 steps of cross validation. As shown in Table 5 and Fig. 7 by increasing the threshold Γp, more files will be selected as prototypes. Because a large domain for finding representative prototypes is explored if Γp is selected as a small value, a small domain was explored for extracting prototypes and the numbers of prototypes is few. During the matching phase, several tests were performed with different values of Γp and Γd. Threshold Γd is the minimum value of similarity score that is used for computing similarity between an unknown file and prototype samples. After extracting different prototypes with different values of Γp, unknown files in the test Table 5 Prototypes numbers at each threshold Γp. Threshold Γp(%)

Number of prototypes

10 13 15 17 20

8 17 29 49 91

119

dataset were compared with extracted prototypes along with different thresholds Γd. Different Γd values were considered from 5% to 30% with intervals of 1%. Since a lot of experiments were performed, the results were expressed in the average threshold Γd: 5–10%, 11–15%, 16–20%, 21–25% and 26–30%. To evaluate the proposed system, True Positive rate (TP), False Positive rate (FP), F-Measure, Accuracy, Recall and Precision is measured. Table 6 shows the result of the mentioned measures. In the Table 6 test dataset set files are compared with all prototype set that are extracted with a specific Γp. In this comparison, for each threshold of Γp, different thresholds Γd from 5% to 30% are examined. Figs. 10–12 illustrates the results of the F-Measure, accuracy and FP rate respectively. In Fig. 10 the horizontal axis represents the average of different thresholds Γd and the vertical axis displays the F-Measure of threshold Γp. The results presented in Fig. 10 stated that if Γp is equal to 20%, the highest F-Measure close to 98% and the false positive close to zero are obtained. This could be due to the large number of prototype samples in threshold Γp of 20%. When the threshold Γp is 10%, the lowest F-Measure is achieved with eight prototype samples in average. We considered the average threshold Γd of 5–10% and Γp equal to 10%, and accuracy close to 88% was achieved in classification of malware samples from benign. The result in this case is very interesting because with only 8 prototypes 88% malware samples are identified. This showed that the method of extracting prototypes, worked very well. It also enhanced the speed of malware detection while maintaining the accuracy. In Figs. 11 and 12 the horizontal axis represents the average of different thresholds Γd. The vertical axis of Fig. 12 displays the accuracy rate of threshold Γp and the vertical axis of Fig. 11 illustrates the false positive of threshold Γp. According to Fig. 12 if Γp is equal to 20%, the lowest false positive in all different thresholds Γd is obtained and in the best case is close to 0.

Table 6 Predictive performance of the proposed system. Γp ¼ 10%

Detection threshold(Γd) 5–10 11–15 16–20 21–25 26–30

TP 0.851 0.851 0.829 0.839 0.857

FP 0.098 0.090 0.121 0.156 0.160

Recall 0.851 0.851 0.829 0.839 0.857

Precision 0.897 0.904 0.873 0.843 0.843

F-Measure 0.873 0.877 0.850 0.841 0.854

Accuracy 0.880 0.881 0.860 0.848 0.849

Γp ¼ 13%

Detection threshold(Γd) 5–10 11–15 16–20 21–25 26–30

TP 0.943 0.941 0.963 0.895 0.933

FP 0.042 0.048 0.045 0.065 0.100

Recall 0.943 0.941 0.963 0.895 0.933

Precision 0.957 0.951 0.955 0.932 0.903

F-Measure 0.950 0.946 0.959 0.913 0.920

Accuracy 0.958 0.950 0.965 0.920 0.917

Γp ¼ 15%

Detection threshold(Γd) 5–10 11–15 16–20 21–25 26–30

TP 0.957 0.956 0.955 0.902 0.946

FP 0.058 0.046 0.037 0.059 0.110

Recall 0.957 0.956 0.955 0.902 0.946

Precision 0.943 0.954 0.963 0.939 0.896

F-Measure 0.950 0.955 0.959 0.92 0.925

Accuracy 0.958 0.960 0.965 0.930 0.918

Γp ¼ 17%

Detection threshold(Γd) 5–10 11–15 16–20 21–25 26–30

TP 0.941 0.952 0.941 0.916 0.964

FP 0.024 0.031 0.021 0.06 0.120

Recall 0.941 0.952 0.941 0.916 0.964

Precision 0.975 0.968 0.978 0.939 0.889

F-Measure 0.958 0.96 0.959 0.927 0.930

Accuracy 0.960 0.967 0.965 0.930 0.922

Γp ¼ 20%

Detection threshold(Γd) 5–10 11–15 16–20 21–25 26–30

TP 0.944 0.946 0.950 0.963 0.988

FP 0.002 0.004 0.007 0.049 0.125

Recall 0.944 0.946 0.950 0.963 0.988

Precision 0.998 0.996 0.993 0.952 0.888

F-Measure 0.967 0.970 0.971 0.957 0.940

Accuracy 0.975 0.978 0.975 0.955 0.930

120

M. Ghiasi et al. / Engineering Applications of Artificial Intelligence 44 (2015) 111–122

Table 7 Accuracyof our system in comparison to some of the updated anti-viruses. ESET NODE32

Detection rate

0.878

Kaspersky

0.976

Avira

0.920

McAfee

0.964

Our detection Average

Best

0.959

0.978

Table 8 The result of implementation of (Tian et al., 2010) study on our dataset. Fig. 10. F-Measure rate of different thresholds Γd and different thresholds Γp. RF J48 SMO BLR

Fig. 11. Accuracy rate of different thresholds Γd and different thresholds Γp.

TP

FP

Recall

Precision

F-Measure

Accuracy

0.946 0.938 0.935 0.937

0.132 0.145 0.151 0.174

0.946 0.938 0.935 0.937

0.939 0.933 0.93 0.920

0.942 0.935 0.932 0.929

0.921 0.912 0.907 0.902

threats, but also envisage with a reasonable false alarm ratio. If benign files are detected as malicious mistakenly and then remove the benign files, it causes the computer system to malfunction. In comparison with the API feature set our method shows a significant and consistent improvement in F-measure and accuracy. The accuracy improves about 4%, while the false alarm rate decreases about 11%. Regarding depicted results in Table 8, the API name is not a suitable feature set according to the false positive rate. 5. Future work and conclusion

Fig. 12. False positive rate of thresholds Γd and different thresholds Γp.

By comparing the graphs in Figs. 10–12, it is clear that the threshold, Γp, of 13% and Γd of 16–20% with 96.5% accuracy and 4.5% false positive on average is appropriate to use for malware detection. Our goal is to get an acceptable accuracy with the least number of prototype samples. Our experimental results, based on a dataset of 850 malware and 390 benign files, reached an average accuracy of over 95.9% in distinguishing malware from benign. Our results outperformed comparable updated common antivirus. A comparison of accuracy rate on the same dataset of different antivirus is shown in Table 7. We implemented the method of Tian et al. (2010) that mentioned in related work to compare with our proposed system. In this implementation, we consider each distinct API calls that extracted from dynamic log files as features. The presence or absence of features is checked and a binary vector is created for all samples of our dataset. Then, several well-known classifiers such as random forest, J48, SMO and Bayesian logistic regression (BLR) are trained. Classifiers are evaluated using 10-fold cross validation in all experiments to avoid over-fitting. Table 8 compares the achieved results in more details when four classifiers applied on API name feature set of the datasets using implementation the method of Tian et al. (2010) study. An appropriate malware recognition approach is not only expected to be capable of distinguishing almost all unwanted

In this paper, a technique to reliably and efficiently classify binary files into malware or benign is proposed. A dynamic analysis of API calls is performed and recorded the malicious functionality of the binaries in a controlled environment. Proposed method computed the similarity distance between two binaries based on their registers value set. The concept of prototypes allow efficient classification that speed up the matching procedure when a new file enters into the system, it is compared with the prototypes instead of total dataset files. Empirical results demonstrated that binary files with high accuracy 96% and 4.5% false positive based on their run-time value set. In the future works, we will extend our method to an incremental approach for behaviorbased analysis. Acknowledgment We would like to thank the referees and the editor for a number of helpful comments and suggestions. This research was conducted in the Department of Electrical Engineering and Computer Science of Shiraz University and we would like to express our sincere thanks to Iran Telecommunication Research Center (ITRC) for their supports. Appendix A

DLL name Advapi32. dll

API name GetSecurityDescriptorDacl

API name RegLoadKeyW

DLL name Kernel32. dll

API name CreateFileW

API name LocalFree

M. Ghiasi et al. / Engineering Applications of Artificial Intelligence 44 (2015) 111–122

RegOpenCurrentUser RegOpenKeyA

Kernel32. dll

CreateRemoteThread

Kernel32. dll

CreateThread

RegCreateKeyA

RegOpenKeyExA

Kernel32. dll

FindFirstFileA

Advapi32. dll

RegCreateKeyExA

RegOpenKeyExW

Kernel32. dll

FindFirstFileExA

Advapi32. dll

RegCreateKeyExW

RegOpenKeyW

Kernel32. dll

FindFirstFileExW

Advapi32. dll

RegCreateKeyW

RegQueryValueExW

Kernel32. dll

FindFirstFileW

Advapi32. dll

RegDeleteKeyA

RegSaveKeyA

Kernel32. dll

OpenProcess

RegDeleteKeyW RegEnumValueW RegLoadKeyA

RegSaveKeyExA RegSaveKeyExW RegSaveKeyW

Kernel32. dll Kernel32. dll Ntdll. dll

GlobalReAlloc

Ntdll. dll

NtDeleteValueKey

NtQueryValueKey NtSaveKey

MoveFileExA

Ntdll. dll

NtEnumerateKey

NtSetValueKey

MoveFileExW

Ntdll. dll

NtEnumerateValueKey NtFlushKey

NtUnloadKey NtQueryKey

Advapi32. dll

SaferIdentifyLevel

Advapi32. dll

RegCloseKey

Advapi32. dll

Advapi32. dll Advapi32. dll Advapi32. dll

Advapi32. dll

Kernel32. dll

RegSetValueExA BaseInitAppcompatCache CloseHandle

Kernel32. dll

CopyFileA

MoveFileW

Ntdll. dll

Kernel32. dll

CopyFileExA

Ntdll. dll

Kernel32. dll

CopyFileExW

Process32Next ReadFile

Kernel32. dll

CopyFileW

ReadFileEx

User32. dll

Kernel32. dll

User32. dll

GetStartupInfoW NtCreateKey

NtNotifyChangeKey SetWindowTextW SystemParametersInfoA

LoadLibraryA LoadLibraryW FindResourceExA GetCommandLineW GetConsoleOutputCP GetSystemDirectoryA GetWindowsDirectoryA ExitProcess ExitThread

GetWindow WindowFromPoint

121

GetSystemMetrics LoadStringW

Kernel32. dll

CreateDirectoryA

RemoveDirectoryA

User32. dll

GetWindowThreadProcessId

Kernel32. dll

CreateDirectoryExA

User32. dll

LoadImageA

Kernel32. dll

CreateDirectoryExW CreateDirectoryW

RemoveDirectoryW SetEndOfFile

User32. dll

CreateWindowExA

CharUpperW

SetFileAttributesA

User32. dll

GetDlgItem

DefWindowProcA SendDlgItemMessageA FtpFindFirstFileW FtpOpenFileA FtpOpenFileW HttpOpenRequestA HttpOpenRequestW InternetConnectA

Kernel32. dll

Kernel32. dll

CreateProcessA

TerminateProcess

User32. dll

GetFocus

Kernel32. dll

CreateProcessW

WriteConsoleA

Wininet.dll

FtpGetFileA

Kernel32. dll

DeleteFileA

WriteFile

Wininet.dll

FtpGetFileW

Kernel32. dll

DeleteFileW

WriteFileEx

Wininet.dll

FtpPutFileA

Kernel32. dll

DeviceIoControl

Wininet.dll

FtpPutFileW

Kernel32. dll

FindNextFileA

Wininet.dll

InternetReadFile

Kernel32. dll

FindNextFileW

Wininet.dll

InternetSetOptionA

Kernel32. dll

FreeResource

WriteProcessMemory GetModuleFileNameA GetPrivateProfileStringA GetVersion

Wininet.dll

InternetSetOptionW

Kernel32. dll

IsValidLocale

CreateFileA

Wininet.dll

InternetWriteFile

Kernel32. dll

MoveFileA

CreateFileMappingA

Wininet.dll

FtpFindFirstFileA

InternetConnectW InternetOpenUrlA InternetOpenUrlW

References Abdulalla, S.M., Kiah, L.M., Zakaria, O., 2010. A biological model to improve PE malware detection: review. Int. J. Phys. Sci. 5 (15), 2236–2247. Alam, S., Horspool, R.N., Traore, I., 2014. MARD: a framework for metamorphic malware analysis and real-time detection. In: Proceedings of the International

122

M. Ghiasi et al. / Engineering Applications of Artificial Intelligence 44 (2015) 111–122

Conference on Advanced Information Networking and Applications. AINA, pp. 480–489. Alazab, M., Huda, S., Abawajy, J., Islam, R., Yearwood, J., Venkatraman, S., Broadhurst, R., 2014. A hybrid wrapper-filter approach for malware detection. J. Netw. 9 (11), 2878–2891. Alzarooni, K.M.A., 2012. Malware Variant Detection. Doctor of Philosophy of the University College London, London. Baldangombo, U., Jambaljav, N., Horng, S.-J., 2013. A static malware detection system using data mining methods. Int. J Artif. Intell. Appl. 4 (4), 113. Bayer, U., Comparetti, P.M., Hlauschek, C., Kruegel, C., Kirda, E., 2009. Scalable behavior-based malware clustering. In: Proceedings of the Network and Distributed System Security Symposium. NDSS, San Diego. Bayer, U., Kirda, E., Kruegel, C., 2010. Improving the efficiency of dynamic malware analysis. In: Proceedings of the 2010 ACM Symposium on Applied Computing. SAC 10, p. 1871. Cesare, S., Xiang, Y., Zhou, W., 2013. Malwise-an effective and efficient classification system for packed and polymorphic malware. IEEE Trans. Comput. 62 (6), 1193–1206. Chandramohan, M., Tan, H.B.K., 2012. Detection of mobile malware in the wild. IEEE Comput. 45 (9), 65–71. Chandramohan, M., Tan, H.B.K., Briand, L.C., Shar, L.K., Padmanabhuni, B.M., 2013. A scalable approach for malware detection through bounded feature space behavior modeling. In: Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering. ASE 2013, pp. 312–322. Comparetti, P.M., Salvaneschit, G., Kirdai, E., Kolbitsch, C., Kruegel, C., Zanero, S., 2010. Identifying dormant functionality in malware programs. In: Proceedings of the IEEE Symposium on Security and Privacy. pp. 61–76. Devesa, J., Santos, I., Cantero, X., Penya, Y.K., Bringas, P.G., 2010. Automatic behaviour-based analysis and classification system for malware detection. In: Proceedings of the 12th International Conference on Enterprise Information Systems. pp. 395–399. Elhadi, A., Maarof, M., Osman, A., 2012. Malware detection based on hybrid signature behaviour application programming interface call graph. Am. J. Appl. Sci. 9 (3), 283–288. Faruki, P., Laxmi, V., Gaur, M., Vinod, P., 2012. Mining control flow graph as API callgrams to detect portable executable malware. In: Proceedings of the 5th International Conference on Security of Information and Networks. SIN 12, pp. 130–137. Fog, A. 2012. Function calling conventions, calling conventions for different C þ þ compilers and operating systems, Copenhagen, Denmark. Fredrikson, M., Jha, S., Christodorescu, M., Sailer, R., Yan, X., 2010. Synthesizing nearoptimal malware specifications from suspicious behaviors. In: Proceedings of the 2010 IEEE Symposium on Security and Privacy. pp. 45–60. Ghiasi, M., Sami, A., Salehi, Z., 2013. DyVSoR: dynamic malware detection based on extracting patterns fromvalue sets of registers. ISC Int. J. Inf. Secur. 5 (1), 71–82. Hegedus, J., Miche, Y., Ilin, A., Lendasse, A., 2011. Methodology for behavioral-based malware analysis and detection using random projections and K-Nearest Neighbors classifiers. In: Proceedings of the 2011 7th International Conference on Computational Intelligence and Security. CIS' 2011, pp. 1016–1023. Hu, X., Chiueh, T., Shin, K.G., 2009. Large-scale malware indexing using functioncall graphs. In: Proceedings of the 16th ACM Conference on Computer and Communications Security. pp. 611–620. Jang, J., Brumley, D., Venkataraman, S., 2011. Bit-shred: feature hashing malware for scalable triage and semantic analysis. In: Proceedings of the 18th ACM Conference on Computer and Communications Security. pp. 309–320. Karbalaee, F., Sami, A., Ahmadi, M., 2012. Semantic malware detection by deploying graph mining. IJCSI Int. J. Comput. Sci. 9 (1), 373–379. Langerud, T., 2008. PowerScan: A Framework for Dynamic Analysis and Anti-Virus Based Identification of Malware Master thesis. Norwegian University of Science and Technology Department of Telematics, Nor-way. Lanzi, A., Balzarotti, D., Kruegel, C., Christodorescu, M., Kirda, E. 2010. AccessMiner: using system-centric models for malware protection. In: Proceedings of the 17th ACM Conference on Computer and Communications Security. pp. 399–412. Leder, F., Steinbock, B., Martini, P., 2009. Classification and detection of metamorphic malware using value set analysis. In: Proceedings of the 4th International Conference on Malicious Unwanted Software. MALWARE 2009. pp. 39– 46. Li, P., Liu, Limin., Gao, D., Reiter, M.K. 2010. On challenges in evaluating malware clustering. In: Proceedings of the 13th International Conference on Recent Advances in Intrusion Detection. RAID'10. pp. 238–255. Liu, K., Tan, H.B.K., Chen, X., 2013. Binary code analysis. J. Comput. 46 (8), 60–68. Macedo, H.D., Touili, T., 2013. Mining malware specifications through static reachability analysis. Lect. Notes. Comput. Sci. 8134, 517–535.

McAfee Labs. 2014. McAfee Labs Threats Report, August 2014. Moser, A., Kruegel, C., Kirda, E. 2007. Limits of static analysis for malware detection. In: Proceedings of the 23rd Annual Computer Security Applications Conference. ACSAC 2007. pp. 421–430. Osaghae, E., Chiemeke, S.C., 2012. A model checking framework for developing scalable antivirus systems. Afr. J. Comput. ICT 5 (3), 37–48. PandaLab. 2014. Quarterly report Q2 2014. Potier, J. 2011. Where is located the return value?, [On-line]. Available electronically at 〈http://jacquelin.potier.free.fr/winapioverride32/doc/faq.htm#returnvalue〉, 2011. Potier, J. 2013, WinAPIOverride32, [Online]. Available electronically at 〈http:// jacquelin.potier.free.fr/winapioverride32/〉. Qiao, Y., Yang, Y., He, J., Tang, C., Liu, Z., 2014. CBM: free, automatic malware analysis framework using API call sequences. Adv. Intell. Syst. Comput. 214, 225–236. Qiao, Y., Yang, Y., Ji, L., He, J., 2013. Analyzing malware by abstracting the frequent itemsets in API call sequences. In: Proceedings of the 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications. pp. 265–270. Rieck, K., Trinius, P., Willems, C., 2011. Automatic analysis of malware behavior using machine learning. J. Comput. Secur. 19 (4), 639–668. Salehi, Z., Sami, A., Ghiasi, M., 2014. Using feature generation from API calls for malware detection. J. Comput. Fraud Secur. 2014 (9), 9–18. Santos, I., Pedrero, X.U., Brezo, F., Bringas, P.G., 2013. NOA: an information retrieval based malware detection system,. J. Comput. Inform. 32 (1), 145–174. Sathyanarayan, V.S., Kohli, P., Bruhadeshwar, B., Mu, Y., Susilo, W., Seberry, J. 2008. Signature generation and detection of malware families. In: Proceedings of the 13th Australasian Conference on Information Security and Privacy. pp. 336– 349. Saxena, P., 2007. Static Binary Analysis and Transformation for Sandboxing Untrusted Plugins Master's thesis. Stony Brook University, New York, NY. Shankarapani, M.K., Ramamoorthy, S., Movva, R.S., Mukkamala, S., 2011. Malware detection using assembly and API call sequences. J. Comput. Virol. 7 (2), 107–119. Skaletsky, A., Devor, T., Chachmon, N., Cohn, R.S., Hazelwood, K.M., Vladimirov, V., Bach, M. 2010. Dynamic program analysis of Microsoft Windows applications. In: Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS, pp. 2–12. Song, F., Touili, T., 2012. Efficient malware detection using model-checking. Lect. Notes Comput. Sci. 7436, 418–433. Symantec Corporation, 2014. Internet Security Threat Report 19. Szor, P., 2005. The Art of Computer Virus Research and Defense. Pearson Education, Addison Wesley Professional, Michigan. Tahan, G., Rokach, L., Shahar, Y., 2012. Mal-ID: automatic malware detection using common segment analysis and meta-features. J. Mach. Learn. Res. 13 (1), 949–979. Tan, H.B.K., and Shar, L.K., 2012. Scalable malware clustering through coarsegrained behavior modeling. In: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering. FSE 2012. Tian, R., 2011. An Integrated Malware Detection and Classification System, Doctor of Philosophy. Deakin University, Melbourne, Australia. Tian, R., Islam, R., Batten, L., Versteeg, S., 2010. Differentiating malware from cleanware using behavioural analysis. In: Proceedings of the 5th IEEE International Conference on Malicious Unwanted Software, Malware 2010, pp. 23–30. Wagener, G., 2006. Development and Design of a Process and a Piece of Software to Analyze Unknown Software. University of Luxembourg, Luxembourg. Wagener, G., State, R., Dulaunoy, A., 2008. Malware behaviour analysis. J. Comput. Virol. 4 (4), 279–287. Weng, C.G., Poon, J., 2008. A new evaluation measure for imbalanced datasets. In: Proceedings of the 17th Australasian Data Mining Conference. AusDM 2008, pp. 27–32. Xiao, X., Yuxin, D., Yibin, Z., Ke, T., Wei, D.A.I., 2013. Malware detection based on objective-oriented association mining. In: Proceedings of the 2013 International Conference on Machine Learning and Cybernetics. pp. 375–380. Yahyazadeh, M., Abadi, M., 2012. BotOnus: an online unsupervised method for botnet detection. J. Inf. Secur. 4 (1), 51–62. Zeng, J., Fu, Y., Miller, K. a., Lin, Z., Zhang, X., Xu, D., 2013. Obfuscation resilient binary code reuse through trace-oriented programming. In: Proceedings of the. 2013 ACM SIGSAC Conference on Computer and Communications Security. CCS'13, pp. 487–498. Zheng, Z., Wu, X., Srihari, R., 2004. Feature selection for text categorization on imbalanced data. ACM SIGKDD Explor. Newslett. 6 (1), 80–89.