A Novel Method for Malware Detection on ML-based Visualization Technique
Journal Pre-proof
A Novel Method for Malware Detection on ML-based Visualization Technique Xinbo Liu, Yaping Lin, He Li, Jiliang Zhang PII: DOI: Reference:
S0167-4048(18)31462-7 https://doi.org/10.1016/j.cose.2019.101682 COSE 101682
To appear in:
Computers & Security
Received date: Revised date: Accepted date:
22 December 2018 11 October 2019 26 November 2019
Please cite this article as: Xinbo Liu, Yaping Lin, He Li, Jiliang Zhang, A Novel Method for Malware Detection on ML-based Visualization Technique, Computers & Security (2019), doi: https://doi.org/10.1016/j.cose.2019.101682
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier Ltd.
A Novel Method for Malware Detection on ML-based Visualization Technique Xinbo Liua,b , Yaping Lina,b,∗, He Lia , Jiliang Zhanga a The
College of Computer Science and Electronic Engineering, Hunan University, Changsha, China b Hunan Provincial Key Laboratory of Trusted System and Networks in Hunan University, Changsha, China
Abstract Malware detection is one of the challenging tasks in network security. With the flourishment of network techniques and mobile devices, the threat from malwares has been of an increasing significance, such as metamorphic malwares, zero-day attack, and code obfuscation, etc. Many machine learning (ML)-based malware detection methods are proposed to address this problem. However, considering the attacks from adversarial examples (AEs) and exponential increase in the malware variant thriving nowadays, malware detection is still an active field of research. To overcome the current limitation, we proposed a novel method using data visualization and adversarial training on ML-based detectors to efficiently detect the different types of malwares and their variants. Experimental results on the MS BIG malware database and the Ember database demonstrate that the proposed method is able to prevent the zero-day attack and achieve up to 97.73% accuracy, along with 96.25% in average for all the malwares tested. Keywords: Malware Detection, Adversarial Training, Adversarial Examples, Image Texture, Data visualization 2018 MSC: 00-01, 99-00
∗ Corresponding
author: Email address:
[email protected] (Yaping Lin)
Preprint submitted to Computers & Security
December 2, 2019
1. Introduction Malicious softwares (Malwares) usually refers to a generic term for all unwanted softwares, designed to gain unauthorized access, steal useful information, disrupt normal operation and adversely affect computers or even mobile devices 5
[1, 2]. Even though there are eight major types of malwares discovered in the real world [3], there were 6,480 and 6,447 publicly disclosed vulnerabilities in 2015 and 2016, respectively, and the number of cases became worse up to 14,712 in 2017 according to cvedetails1 . Therefore, how to detect different kinds of malwares, especially their variants, efficiently and accurately is a challenge nowadays.
10
Traditionally, the signature-based determination is used to detect malware, but its scalability limits the applicability [4]. Static code analysis is another kind of malware detection method, which is working for a complete coverage through disassembling, however, it usually suffers from code obfuscation and the executable files must be unpacked and decrypted before analysis [2]. Different
15
from analyzing code statically, dynamic code analysis is proposed as not to unpack or decrypt the execution file in a virtual environment, which is time-intensive and resource-consuming [3]. More importantly, the methods mentioned above are unable to detect specific types of malware whose behavior is well-camouflaged or not satisfied by trigger conditions.
20
Recently, malware detection has employed different machine learning (ML) methods to improve the detection efficiency [5, 1]. Especially for the visualization techniques in ML-based detection methods, which could not only be more efficient in the detection process but also be flexible enough to break the restrictions between file formats[6, 7]. However, the ML-based detectors are vulnerable to the
25
attack from Adversarial Example (AE) [7, 8]. AE is a special sample generated from the original dataset with a tiny perturbation, which is able to fool the ML-based malware detectors [9]. If there exists such interference from malware AEs, that the detection accuracy of ML-based malware analytical methods will 1 https://www.cvedetails.com
2
be greatly influenced by them [7, 10]. Even worse, these detectors will be induced 30
to output an opposite result. Fortunately, in order to defend the attack from AEs the Adversarial Training (AT) is proposed. AT is an AE-driven technique to increase the accuracy and robustness of the detection model by augmenting training data with targeted AEs in the pre-training process [11, 9]. Even though AT/AE techniques have been used for malware detection such as the malware
35
analysis with targeted interference, they only focused on one specific malware format, such as Android, app or swf files [7, 12, 13], rather than investigating a universal file format for all the potential malwares and their variants. To the best of our knowledge, the AT technique is able to solve the problem of AE’s perturbation, but it has not been used in the area of ML-based malware
40
visualization detection. In this paper, we propose an AT-based malware visualization detection method, named Visual-AT, which not only improves the detection accuracy in malware analysis but also prevents potential attacks from malware AEs and associated variants. Additionally, the proposed method is suitable for most of
45
the universal malware file formats, such as worms, viruses, trojans, spyware et al. In Visual-AT, the generated AEs are used to simulate the potential malware variants which are normally disguised as benign samples in traditional ML-based detectors. Meanwhile, Visual-AT can also imitate malware variants to not only extend the malware dataset but also facilitate extracting malware
50
features. Besides, we optimize the commonly used AE-generation (FGSM and C&W’s attack) and image-transformation methods (visualized transformation and normalization). The experimental results on real malware datasets (from the MS BIG database and the EMBER database) show that our method achieves a superior performance versus four latest works and the traditional SVM- and CNN-
55
based detectors in terms of efficiency and accuracy. An up-to 97.73% accuracy is obtained, along with 96.25% on average for malware variants tested. In addition, compared to the normal methods without using AE/AT, the proposed method can obtain an average 28.12% of detection-accuracy increase and 81.24% of false positive rate reduction. This paper makes the following contributions: 3
60
(1) To the best of our knowledge, this paper proposes the first ML-based malware visualization detection method with a suitable model regularization by exploiting both AE and AT techniques. (2) The proposed method can generate the variance and camouflage of malwares to mitigate the novel and probable malwares in real detection. This
65
method (Visual-AT) is also a more accurate and efficient detection method to prevent the zero-day attack. (3) This work carries out a performance analysis of the proposed method and evaluation in terms of accuracy and robustness of real dataset. The rest of the paper is organized as follows: Sect. 2 surveys the related work
70
of malware detection and adversarial techniques, Sect. 3 describes the proposed method and the corresponding improvement malware detection approach, Sect. 4 shows the experimental results and Sect. 5 discusses the possibilities for the proposed method to improve the accuracy and robustness. Finally, conclusion and future works are presented in Sect. 6.
75
2. Related work 2.1. Machine Learning based Malware Detection Methods Machine learning-based malware analysis and detection are hot spots in several research efforts. These efforts employ various behavioral features of malware as input for statistical analytical models. By analyzing code or tracing
80
events, analysts could acquire the features needed [14], e.g. system calls, registry accesses, and network traffic. In general, these kinds of sequences will be analyzed through supervised (for classification), semi-supervised and unsupervised (for clustering) learning methods [15, 16]. In recent years, several workers have used advanced machine learning methods
85
and extracted more information from malware datasets for analysis. As recurrent neural networks (RNN) become popular, some researchers have used RNN for malware detection and classification [17, 18, 14], where the API sequence invoked by a program is used as the input of RNN. Then, the RNN detector will predict
4
whether the program is benign or malware. Although these methods only process 90
the sequence in the forward direction, some sequential patterns may lie in the backward direction. Bidirectional RNN [19] tries to learn patterns from both directions, that is an additional backward RNN is used to process the reversed sequence. But the computation process will incur a substantial overhead with low efficiency, in which output probability is used to calculate the concatenation
95
of the hidden states from both directions. By applying discriminate distance matrices learning, Kong and Yan [20] proposed a method that observes the similarity on the extracted fine-grained features between two malware programs based on the function call graph for each sample. This learning method could cluster the malware samples belonging to the same family while keeping the
100
different clusters separate by a marginal distance. The weakness of this method is the detection accuracy relies heavily on the extracted fine-grained features to compare similarity. In addition, another different kind of method is proposed by the recent application of statistical topic modeling approaches to the classification of
105
system call sequences [21], which was further extended with a nonparametric methodology [22]. Subsequently, this method has been extended by taking system call arguments as additional information as well as memory allocation patterns and other traceable operations [23]. Pfoh et al. [24] exploit SVM with string kernels to represent a sequence-aware method that is capable of detecting
110
malicious behavior online by classifying system call traces in small sections and keeping a moving average over the probability estimates. The shortcoming of this method is that the process to act maliciously should interact in some manner with the rest of the system, or this interaction must take place through the interface provided by the operating system (e.g. system calls). Moreover,
115
Mohaisen and Alrawi [25] introduced an automated system, AMAL, for large scale malware analysis and classification. The AMAL consists of two subsystems. One is AutoMal to collect malicious low granularity behavioral artifacts, and another is MaLable to create representative features with artifacts. MaLabel has the ability to tune different learning algorithms for classification, including 5
120
SVM, K-nearest neighbor, and decision tree. The downside to this automatic system is an unnecessary overhead by running malware samples in virtualized environments. 2.2. ML-based Visualization Detection Methods without Adversarial Technique Visualization technique collects binary files of malware which can be read
125
as 8-bit unsigned integers and will be visualized as gray-scale images [26, 1]. The value is between 0 (black) and 255 (white). According to the scale of different data samples and different analytical requirements, the width of these transformed images could be appropriately adjusted, for instance, 32 for the file size below 10KB, 64 between 10KB to 30KB. Once the width of the transformed
130
image is set, its height is allowed to change due to malware sizes. Fig. 1 shows an example of Trojan downloader from dataset MS BIG [27], representing the visualization process of malware. Additionally, a detailed taxonomy of various primitive binary fragments and their corresponding visual regions (as gray-scale images) are illustrated with the distinctive image textures, e.g. the section of
135
.text, .rdata, .data and .rsrc. Certainly, icons might be also included in the application if needed. ML-based visualization methods became popular for detecting malware in recent years, such as Convolutional neural networks (CNN) [28, 29], Support Vector Machines (SVM) [30, 1], Nearest Neighbours (NN) [1, 4], K-means [31, 32],
.text
.rdata
Bytes files
Malware Binary 100101001110100011 10101011101010...
94 E8 EA ... C 7 01 BB ... 5 E C 2 04 ... ... ... ... ...
.data
Binary to 8 bit vector
.rsrc
8 Bit Vector to Grayscale Image Figure 1: Process of Malware Visualization Transformation.
6
140
Random Forest (RF) [33], Decision Tree (DT) [34, 35] and et al. About different ML-based detectors, this type of methods has been proposed for detecting unknown samples or underlining those samples that exhibit unseen behavior for detailed analysis. Lee et al. [36] firstly transformed the malicious code into the image to accelerate the malware detection. And then, Kong and Yan
145
[20] proposed to classify the malicious samples automatically with encoded features. Han et al. [5] analyzed the global features of malware based on binary textures. However, the limitation of these methods is that an attacker can adopt counter measures to beat the system because of the features based on a global image. Then, Makandar et al. [1] constructed the texture feature
150
vectors with multi-resolution and wavelet for malware image classification via SVM. Huang et al. [6] introduced an R2-D2 method through a color-inspired RGB texture image without extracting pre-selected feature via CNN. Recently, Kalash et al. [28] exploit a deep learning approach, which converts malware binaries to gray-scale images and subsequently train a CNN framework for
155
further classification. In general, these ML-based visualization methods could obtain high accuracy, additionally the rate of true positive and false positive also illustrated a good robustness [1, 2]. But the majority of these studies are only based on compression and dimensional reduction with real data samples to extract malicious features. It hence brings a serious security threat: if AEs are
160
involved, traditional ML-based detection cannot successfully identify malware [37]. 2.3. Adversarial Techniques in Malware Detection Adversarial technology has been rapidly developed in the recent three years[38]. Adversarial Training (AT) was originally proposed to improve the
165
robustness of distinguishing model between training and predictions by Szegedy’s group [9]. However, it requires a computationally expensive inner loop to identify the adversarial direction. To overcome this problem, an optimized definition of adversarial perturbation was proposed by Goodfellow et al. [39, 40] to approximately compute the inner loop. Madry et al. [41] proved that AT can be used 7
170
to defend the white-box attacks if the perturbation computed during training is close to the maximum of model loss. Even though a few studies employed AE and AT for malware detection, none of them relied on the ML-based visualization detection approach. Grosse et al.[7] used AEs for discrete and binary input domain to mislead classifier in malware samples. But this attack is able to
175
handle the specific binary feature in Android malware detection. Maiorca et al. [12] were the first to formally define and quantitatively characterize this vulnerability in SWF-file-based malware dataset. To mitigate the zero-day attack from malware, Kim et al. [13] proposed a transferred generative adversarial network (tGAN) for automatic classification with visualization-based malware
180
transformation. These methods mentioned above are not able to deal with the exponential growth of malware variants and AEs. Considering these reasons, with the increased importance of the adversarial techniques, the normal detection methods (even with the ML-based visualization approaches) are hard to adapt to the relevant requirements of different detection object or the environment in
185
the recent research [37] of malware determination.
3. Methodology In this section, the proposed method, Visual-AT, for malware’s ML-based visualization detection is described in detail. By using adversarial techniques, Visual-AT improves not only the effectiveness of the detection model but also 190
the accuracy and robustness. At first, we provides an overview of Visual-AT in this section. Secondly, we descries how to craft an AE to simulate the variant malware in Sect. 3.2. Thirdly, we elaborate on how the detection model could be enhanced with AT. Finally, according to existing researches, this paper analyzes the computational cost of the proposed Visual-AT method in detail.
195
3.1. Overview Fig. 2 shows the flow chart of Visual-AT which contains five functional modules, including Data preprocessing, Pre-training Process, AE Generation
8
Bytes files
Data
Malware Dataset
Malware Binary 10010100111010001110 101011101010...
Malware Grayscale Image
Data preprocesssing
Rescale & Preprocess Image
…… Input
Pre-training Process
Output
…… AE Type 2
AE Type 1
Original Dataset
AE Type n
AE Generation Process
CNN
……
+
Enlarged Dataset
SVM
Detection Methods
Adversarial Training & Detection Process
Malware (M)
Ramnit (R)
Lollipop (L)
……
Benign sample (B)
Gatak (G)
Output Results Figure 2: Flowchart of the proposed Visual-AT method.
Process, Adversarial training & detection process and Output Results. Each module plays a specific role. Meanwhile, the colored arrows denote the flow 200
direction of the corresponding dataset. Since the researchers usually obtain malware datasets with all these exiting classes, the proposed method can be applied to solve the problem with a kind of attack in gray-box [42]. Visual-AT firstly converts malware code into feature images (or malware
9
images) and then rescales these transformed images to the same size in the Data 205
preprocessing stage. By means of these preprocessing measures (transformation and normalization), the image samples are used for ML-based malware detection, such as using CNN and SVM. To generate AEs and simulate corresponding subtle variants for certain malware types, the original dataset will be transmitted to the stages of Pre-training Process and Adversarial Training & Detection
210
Process after Data preprocessing. Assume that there are no AEs available, the preprocessed dataset will be transmitted to the final stage directly and make a limited determination. However with the help of adversarial techniques during Pre-training Process and AE Generation Process stages, the proposed method crafts a subset of targeted AEs to enlarge the dataset. Finally, Visual-AT will
215
be enhanced with a suitable regularization of the generated dataset. Therefore, this method is able to intentionally craft a subtle perturbation to simulate malware variants and camouflage from original malware dataset via different AE generating methods. 3.2. Generation of Malware Variants
220
Visual-AT targets the generation of malware variants by virtue of AEs’ purposiveness. The FGSM and C&W’s attack methods are very popular AEcrafted methods [8, 43] in terms of transferability, robustness, and overhead. Both of them are therefore used to generate targeted malicious samples to produce malware variants and improve the discriminant accuracy effectively.
225
3.2.1. Visual-AT FGSM method FGSM [39] is a fast and robust method only performing one step gradient update along the direction of the sign at each pixel point. The perturbation can be expressed as, δ = ε · sign(∇x Jθ (x, l)) where ε represents the distortion between AEs and original samples. sign(·) denotes the sign function,∇x Jθ (·, ·) computes the gradient of the cost function J around the current value x of the model parameters θ. l is the label of x. 10
x “Obfuscator.ACY” 81.3% confidence
x* x
1 2
(tanh( w) 1) x
“Benign sample” 99.7% confidence
Figure 3: Process of Perturbed Visualized Malware
To achieve an accurate detection, the FGSM method is optimised to exploit 230
an index i for a maximum gradient, i.e. arg max(f (x∗ )) = y ∗ or the index i
value reaching a threshold with the maximum index imax , according to existing works of [39, 8]. Since the quantity of generated AE is limited for the training dataset, the random normal distribution [44] is introduced to fine-tune the distortion parameters (ε) for AT. The perturbations (δ) will be illustrated as 235
f (δ|µ, σ 2 ) =
1 √
2πσ 2 e
−
(δ−µ)2 2σ 2
, where µ is the expectation value of the distribution
and σ is the standard deviation. Visual-AT can simulate different types of malware variants and disguise rapidly. 3.2.2. Visual-AT C&W’s attack method L-norm distance-based C&W’s attacks [45], e.g. l0 , l2 & l∞ , are also used in the AE generation module. To optimize the penalty term and distance approximation, the basic problem of this targeted attack is expressed as: arg minkx∗ − xk, s.t. f (x∗ ) 6= f (x), δ
11
where δ = x∗ − x, a new objective function g is defined as: minkδkp + c∗ · g ∗ (x + δ), s.t. x + δ ∈ [0, 1]n . δ
Note that if and only if f (x∗ ) = y 0 6=y, the penalty term and distance can be 240
further optimized. In this case, the optimization formulation could be modified as follows. Taking l2 attack as an example, the original single reformer c∗ · g ∗ (·)
is divided into two parts as r · gr (·) and d · gd (·), as c∗ · g ∗ (·) = r · gr (·) + d · gd (·), which denotes the corresponding loss function in a detector. Additionally, r and d are chosen via binary search. Algorithm 1 illustrates the pseudo-code of the 245
optimized C&W’s attack-based method in l2 -norm attack, according to existing Algorithm 1 Crafting AEs with C&W’s attack-based method Input: x, y, ε, f (·) Output: x∗ , δ x∗ ← x
//Data preprocessing
δ = 12 (tanh(w) + 1) − x.
while arg min(D(x, x + δ)) and f (x∗ ) 6= y do δ
mink 12 (tanh(w) + 1)k2 + c · g( 21 (tanh(w) + 1)), w
//Optimize w g(x∗ ) = max(max(Z(x∗ )i ) − Z(x∗ )t , −κ), δ
//where Z(·)i is the softmax function for class i c · g(x∗ ) = r · gr (x∗ ) + d · gd (x∗ ) //where, r and d are chosen via binary search if wmax ≤ 0 then return Failure end if δ = ε · 12 (tanh(w) + 1) − x, //update δ ∗
x ←x+δ end while return x∗, δ
12
works of [9, 43]. Fig. 3 shows an example of AE perturbation process from the correct detection result with an average confidence of 81.3% on the MS BIG dataset to the induced error detection result with an average confidence of 99.7% by C&W’s attack with the distortion ε = 0.45. 250
3.3. Optimised ML-based Visualization Malware Detection Since AT can effectively improve the regularization of the detection model, regularization model is incorporated in Visual-AT to achieve a better detection accuracy. In contrast with the normal data augmentation where AEs do not appear in the test set naturally, attackers are highly likely to reveal the flaws of detection models. The AE-based malware variants will be directly submitted to CNN and SVM detectors. But the detection results cannot meet the security requirement. Therefore, AEs are employed in the AT process to enhance the detection model with a suitable regularization. On the basis of the existing works [46, 40], the modified loss function of AT is then defined as, Ladv (X, Y, θ) = D[H(Y ), P (Y |X + Radv , θ)]
,
where Radv = arg maxD[H(Y ), P (Y |X + Radv , θ)], θ is the model parameter R;R≤
of the vector during training. D[P, P 0 ] denotes a non-negative function, which represents the distance set between two distributions P and P 0 . The function H(Y ) is a distribution derived from the training sample, and also the function to which we strive to approximate the parameterized model. Since the malware’s binary indicator vector X does not possess any particular structural properties or interdependencies, a regular feed-forward neural network is applied for the training process. The rectifier F (·) is used as the activation function for each hidden neuron in the training network and additionally applied for standard dropout and gradient descent. For the normalization of output probabilities, a softmax layer [47] is employed, which can be expressed as, m
Fi (X) =
n X exi , x = wj,i · xj + bj,i , i ex0 + ex1 j=1
13
where F (·) is an activation function, xi denotes to the i-th input to a neuron, w is the weight vector in gradient descent and b represents the corresponding typical value. 3.4. Computational cost analysis 255
As a machine learning based detection method, the computational cost is also an important factor that is worth taking into consideration. Paying attention to the proposed method, the computational cost mainly depends on the AE generation process [48] and the training process of the detection model [49]. At first, this section discusses the asymptotic analysis for the training process
260
and the AE generation process respectively, including the time complexity and space complexity of the proposed Visual-AT method. For the training process of the detection model, the time complexity can been illustrated as, PD T ime ∼ O( l=1 Ml2 · Kl2 · Cl−1 · Cl ), according to the research in [49, 50], where, M is the edge length of output feature map in each convolutional kernel, K
265
denotes the edge length of each convolutional kernel, C represents the number of channels for each convolutional kernel, D is the convolutional layer number of neural network (i.e., depth of network), and l denotes the l-th convolutional
270
layer of a neural network. Meanwhile, the corresponding space complexity PD is Space ∼ O( l=1 Kl2 · Cl−1 · Cl ), where, the space complexity reflects in the volume of the method (or model) itself, which is related to the size of the
convolution kernel (K), the number of channels (C), and the depth of the network (D). Referring to the work of [48, 51] for the AE generation process, the time complexity analysis for gradient optimization-based AE generation method could be illustrated as, T ime ∼ O( 275
2 LPM ε
). Where, L and P are Lipschitz constants,
which are associated with the specific norm (k · k), as L ≡ Lk·k , P ≡ PM,k·k . And ε denotes the gradient distortion parameter. Additionally, the corresponding space complexity could be expressed as, Space ∼ O(
LPx2 ε ).
In this work, the proposed Visual-AT method contains not only a CNN-based training process for model detection but also a gradient optimization-based AE generation process. According to the analysis above, the computational cost for 14
time complexity could be demonstrated as, T ime ∼ O(
D X l=1
Ml2 · Kl2 · Cl−1 · Cl +
2 LPM ). ε
In addition, the corresponding space complexity of the proposed Visual-AT method could also be deduced as follows, Space ∼ O(
D X l=1
Kl2 · Cl−1 · Cl +
LPx2 ). ε
Where the space complexity reflects in the volume of the proposed method, but has nothing to do with the size of the input. 280
The Visual-AT method presented in this paper is a robust method that is based on the model pre-training process with ML algorithm. In general, the modeling process will account for the majority of computational cost.
285
Since it is not difficult to obtain the computational cost, we have T ime ∼ PD LP 2 O( l=1 Ml2 · Kl2 · Cl−1 · Cl ) T ime ∼ O( εM ) according to the calculation above. The corresponding overhead can be computed mainly according to
the selected model framework in the Visual-AT method. Therefore, for the proposed Visual-AT scheme, the computational cost could be approximated PD PD as, T ime ∼ O( l=1 Ml2 · Kl2 · Cl−1 · Cl ) and Space ∼ O( l=1 Kl2 · Cl−1 · Cl ).
Additionally, the computational cost for the detection model’s training process 290
is always consistent with corresponding detection methods. In the following experiment part of Sect. 4.5, this paper discusses the real experimental results of AE generation and model training process of Visual-AT in detail.
4. Experiments In this section, verification and experimental evaluation are conducted in 295
terms of effectiveness and performance of Visual-AT. Firstly, this section introduces the setup of the experiment, such as the necessary routines for preprocessing the collected dataset, and the experimental setup in detail. Then, two types of experiments are conducted. One is verifying the feasibility of the proposed Visual-AT method, the other is testing the extendibility of different detection 15
300
methods with different AE crafted methods. Meanwhile, the cross-validation for statistical evaluation is also used to analyze the reliability of the proposed approach. Finally, by testing the computational cost of the proposed method, which includes the training process and generation process of AE, this experiment validates the flexibility and reliability of Visual-AT on a platform with
305
real malware dataset. The experiment programs are implemented in Python 3.6 and MATLAB 2018a under the operating system CentOS Linux release 7.6.1810 (Core), while the testbed features an Intel(R) Xeon(R) Silver 4110 CPU at 2.10GHz, 128GB RAM, and 16GB NVIDIA Quadro P5000 GPU with CUDA 10.1. The detailed descriptions of each experiment are as follows.
310
4.1. Dataset and Setups Verification and experimental evaluation are conducted in terms of the effectiveness and performance of Visual-AT. Firstly, an open source malware dataset in Kaggle Microsoft Malware Classification Challenge (BIG 2015) [27] is used in this experiment, which consists of 10,678 labeled malware samples with
315
nine classes. For benign executables, these samples are collected by scraping all the valid executables from a freshly installed Linux CentOS 7.6, Windows 10, and iOS 11.3 on the virtual machine. By using anti-virus vendors in the VirusTotal search [52], 1200 benign file samples have been selected. The distribution of these samples is illustrated in Table 1. Secondly, considering the renewal and
320
variant of malware dataset another latest dataset, named Ember 2018 (Endgame Malware BEnchmark for Research) [53], is also collected for this experiment, which is an open source collection of 1.1 million portable executable files that were scanned by VirusTotal as well. From the Ember 2018 dataset, 4000 training samples (2000 malicious and 2000 benign) and 1000 test samples (500 malicious,
325
500 benign) are collected in the further experiment. In order to evaluate the Visual-AT method, this work conducts extensive experiments on both the BIG 2015 and Ember 2018 datasets. Firstly, in section 4.2, it presents experiments designed to validate the effectiveness of the proposed method by making a quantitative comparison between different malware detec16
Table 1: MS BIG dataset with malware class distribution & benign samples.
Types of Malware
330
Number of Samples
Ramnit (R)
1534
Lollipop (L)
2470
Kelihos ver3 (K3)
2942
Vundo (V)
451
Simda (S)
41
Tracur (T)
685
Kelihos ver1 (K1)
386
Obfuscator.ACY(O)
1158
Gatak (G)
1011
Benign sample(B)
1200
tion methods. Secondly, in section 4.3, it demonstrates experiments aimed at evaluating the performance index of Visual-AT through testing the accuracy of these constructed model during AE generation. Thirdly, in section 4.4, it also illustrates the comparison of different methods for malware detection, which includes the detection accuracy comparison between four similar latest works in
335
the field and the proposed Visual-AT. Finally, in section 4.5, it represents the computational cost results of the proposed method, including the memory-usage, volatile GPU-util, CPU Utilization, Power Usage and time consumption of the model training in AE generation. In these experiments, considering the robustness and expendability of neural
340
network architecture, the GoogleNet Inception V3
2
is adopted in Visual-AT
for AE generation and pre-training. During the process of AE generation and pre-training, the targeted AEs will be built upon the FGSM and C&W’s attack methods according to the different distortion parameter ε, which are implemented for the follow-up testing experiments. To verify the effectiveness 2 https://github.com/BartyzalRadek/Multi-label-Inception-net
17
345
of the proposed method, the targeted AEs generated from the BIG dataset are exploited to compute the detection accuracy in Visual-AT, while comparing the method against traditional ML-based visualization malware detection methods, such as CNN and SVM. The evaluation metrics of this experiment include the accuracy value, false positive rate, and false negative rate, which is effective for
350
quantitative evaluation. A high accuracy with both low both false positive rate and false negative rate indicates a better performance. Moreover, in order to evaluate the performance metrics of Visual-AT, the AEs generated from the Ember dataset are used to test the detection accuracy of these constructed model during AEs generation. By adjusting different sizes of training dataset
355
and different parameters during AEs generation process, the experiment results of detection accuracy show a variation tendency clearly in performance analysis. Therefore, malware detection results based on different datasets are elaborated, along with discussions of the ability to resist AEs. Finally, by executing the model pre-training process with different sizes of datasets and evaluating the
360
different AE generation methods in Visual-AT, the computational cost results of the proposed method are also represented in this section. Parameters and settings are described in detail below. 4.2. Quantitative Comparison for Different Malware Detection Methods 4.2.1. Results of Malware Detection with Pure Dataset
365
A pure dataset from BIG dataset without AEs is initially used, which contains only the transformed malware samples. The selected samples are separated into 10-fold cross-validation through data partitioning and data pre-processing. These data will be respectively propagated to SVM- & CNN-based malware detectors and the proposed Visual-AT enhancement detectors. During the pre-training
370
procedure, two different AE crafting methods are used to generate AEs for the Visual-AT. The distortion parameter ε is 0.5 for FGSM and 0.35 for C&W’s attack. As for SVM-based detector, it sets the parameter as γ = 1, C = 1.0 and k = 10, while the default parameters of GoogleNet Inception V3 are used in this CNN-based detector. 18
375
In Table 2, comparison results are listed including the accuracy, false positive rate, and false negative rate. Visual-AT achieves an increase in the average accuracy of 8.64% in SVM-based detector while the average false positive and negative rates drop below 3.65% and 2.31% respectively. Factors versus a CNNbased detector are 4.47%, 2.93%, and 1.70%, respectively. Compared to both
380
traditional ML models, Visual-AT not only obtains an average accuracy increase of 7.41% but also a decrease of 68.17% regarding the false positive rate. In addition, the detection model using FGSM-based AE is slightly less accurate than the one with C&W’s attack-based in the SVM detector, with a difference in value from 94.55% to 96.18%. As for the accuracy of the CNN detector,
385
the model using FGSM-based AE shows superiority over the one using C&W’s attack-based method with a difference in the accuracy from 97.56% to 96.25%. As expected, the difference between gradient descent and norm optimization in AE generation methods is responsible for the findings above. Compared with the traditional detection methods, such as code analysis and
390
data signature in related work [2, 3, 54], the ML-based visualization detection methods have improved the accuracy with an average increase of 20%, as the value of 87.13% for SVM-based detector and 92.16% for CNN-based one according to the experiment results in Table 2. Furthermore, by adopting the adversarial techniques the detection results can obtain an efficiency improvement. Table 2
395
lists the comparison between normal ML-based visualization detectors and VisualAT, where the latter achieves a better performance. The detection accuracy of Table 2: The detection results of different detectors with the pure dataset SVM
CNN
Accuracy False positive rate False negative rate Accuracy False positive rate False negative rate Normal Methods FGSM Visual-AT (Our method)
87.13%
11.57%
4.82%
92.16%
8.62%
3.27%
94.55%
5.82%
3.76%
97.56%
3.56%
2.10%
l0 attack
96.18%
2.45%
1.48%
96.43%
2.61%
1.31%
C&W’s attack l2 attack
95.80%
3.76%
2.35%
96.07%
3.19%
2.13%
l∞ attack
96.53%
2.58%
1.67%
96.24%
2.45%
1.47%
19
Visual-AT achieves a 5%∼10% increase and a 5%∼8% false positive rate drop versus traditional methods. The CNN-based detector without AT has a value of 92.16% with the accuracy on average whereas Visual-AT increases the number 400
to 96.53%. Factors become 87.13% and 95.76%, respectively for the SVM-based detector. 4.2.2. Results of Malware Detection with AEs The sample set used above is based on the real dataset without crafted AEs. Since the latest works have found that the influence of AEs on ML and Artificial
405
Intelligence (AI) is getting worse gradually [43, 37], further experiments were implemented using datasets with AEs. In this paper, a 10-fold cross-validation test is also used to estimate the performance of the proposed method, where the same size of the malware AE set is inserted into the whole dataset to replace the selected one-part subset for prediction testing. In each sub-testing
410
experiment, it exploits different AE generation methods for AE malware set, and AE generation process in Visual-AT detector. For example, when the AE malware set crafted by FGSM-based method is used in experiment, the C&W’s attack-based AE generation process will be selected in Visual-AT detector for the sub-testing. The corresponding results of an average value after 10-fold
415
cross-validation are shown in Table 3. Versus malware detection results using the pure dataset in Table 2, traditional ML models have a significant deterioration in the discriminant performance when encountering interferences from AEs. The value of the accuracy drops from 87.13% to 60.23% in the SVM-based detector Table 3: The detection results of different detectors with the obstructed dataset SVM
CNN
Accuracy False positive rate False negative rate Accuracy False positive rate False negative rate Normal Methods FGSM Visual-AT (Our method)
60.23%
34.59%
8.82%
54.16%
48.62%
7.63%
95.37%
5.12%
2.96%
97.73%
3.17%
1.58%
l0 attack
96.72%
2.45%
1.48%
97.14%
1.98%
1.05%
C&W’s attack l2 attack
95.80%
3.47%
2.13%
96.26%
3.18%
2.17%
l∞ attack
96.21%
2.47%
1.71%
96.56%
2.14%
1.52%
20
and from 92.16% to 54.16% in the CNN-based one. In terms of the false positive 420
rate on average, it increases from 4.82% to 34.59% and from 8.62% to 48.62% in SVM- and CNN-based detectors, respectively. These obvious gaps indicate that the AEs hugely affect the performance of traditional ML-based visualization methods in malware detection. More importantly, the proposed Visual-AT method shows strong robustness
425
and performs even better when AEs are involved in the dataset. Compared with the results of normal ML-based visualization methods in Table 2 and 3, Visual-AT achieves an 1.59× and 1.78× detection-accuracy increase compared to the SVM-based and CNN-based detectors, respectively. In addition, one can observe an obvious drop of the false positive rate in two ML-based detectors,
430
from the value of 34.59% to 3.36% in SVM and from the value of 48.62% to 2.69% in CNN. Since the AEs can be considered as the potential malware variants, the proposed method is proven to have a unique ability to defend the threat of malware variants. 4.3. Performance Analysis of Visual-AT
435
To further analyze the performance of Visual-AT, the accuracy of these constructed model is tested through different sizes of training dataset and different values of parameters during AE generation. In this experiment, considering the novelty and variant of malware in recent years, a relatively latest open-source dataset of the Ember 2018 is taken into consideration for further evaluation.
440
4.3.1. The number of AEs in dataset In general, the robustness and accuracy of the ML-based malware detector are affected by the size of the training dataset. A bigger dataset size leads to higher prediction accuracy. For the FGSM method, different dataset sizes of AEs are used in the whole experiment dataset (such as n = 100, 200, 500, 100, 1500),
445
and also the accuracy of Visual-AT in SVM- and CNN-based detectors are tested with ε = 0.5. To better illustrate the variant trend between the accuracy versus the AE number size, the experimental results are shown in Fig. 4. The increasing
21
curves indicate that the accuracy of each detection model is growing with the rise of the scale during AE generation. 450
When the AE’s scale of training dataset grows up to 1000, the accuracy of these two improved methods are up to 96.81% and 97.39%, as is shown in Fig. 4 respectively. These results are much higher than those of the traditional CNN and SVM detectors, whose detection results are under the average value of 86% in related references. However, with the AE size increasing, their accuracies do
455
not rise continuously until this number exceeds a certain degree, after which the accuracy drops. This phenomenon results from the over-fitting issue of detection model with the enlargement of training samples. Therefore, a reasonable size of AE set is of great importance to improve the accuracy of proposed Visual-AT. 4.3.2. The perturbation parameter of AE’s generation in dataset Another important factor, the distortion parameter ε is analyzed in the following to investigate how it affects the discriminant accuracy. Firstly, the FGSM-based method is exploited to generate a serial of AEs with n = 500.
98 97 96
Accuracy (%)
460
95 94 93 92 91 0
SVM CNN 200 400 600 800 1000 1200 1400 1600
Number
Figure 4: The detectors’ accuracy with different number scale
22
By using these AE’s sets the detection accuracy can be tested for different improvement detectors, including SVM and CNN. The experiment process 465
adjusts the value of distortion from 0.1 to 0.7 during AE generation process, and then uses the AEs to calculate the different accuracy in testing different detectors, as shown in Fig. 5. When the distortion (ε) is set to 0.5, the accuracies of these two methods reach 94.61% and 97.10%, respectively. Even though the detectors’ accuracy
470
cannot obtain a satisfactory result with a small intensity value in distortion (ε), adjusting ε during AE generation will achieves better performance. In Fig. 5, one can find that the variation trend of these two lines (red for SVM and green for CNN) are growing gradually with the increase of the distortion strength. However, when the distortion parameter reaches a certain degree (as ε > 0.5), the variational trends of the accuracy will gradually tend to be stable or even decreased. Although the range of the fluctuations in distortion (ε) is less than 0.5%, the influence of these small variations on accuracy cannot be ignored as well. Just as solving the optimal interval in parameter optimized
98 96
Accuracy (%)
475
94 92 90 88 86 0.1
0.2
0.3
0.4
0.5
Distortion( )
0.6
SVM CNN 0.7
Figure 5: The detectors’ accuracy with different distortion value
23
process, when the distortion (ε) is set between 0.45 and 0.60, that the detection 480
accuracy can achieve an optimal result relatively. Since the distortion value will directly affect the difference between AEs and the original sample, it can be inferred that with the increase in the distortion (ε) the difference between AEs and the original samples will be enlarged gradually and will deviate from the original one. In general, the bigger the distortion parameter (ε), the easier the
485
desired attack purpose is to achieve. This difference will become more and more obvious. Even worse, for the pre-training process of the detection model, the excessive perturbation could lead to an opposite effect on the accuracy result, i.e. the over-fitting phenomenon. Therefore, an appropriate perturbation with a suitable distortion value (ε) for AE generation is also of great significance to the
490
Visual-AT method. 4.4. Comparison To investigate the advantages and disadvantages of the Visual-AT method, four latest studies in this field have been selected for comparison. A decision tree-based ensemble learning method, Random Forest (RF) method [33] with
495
a visualization technique for malware detection has been firstly selected to be compared with the proposed method. A Lempel-Ziv Jaccard Distance (LZJD) with kNN method [55] based on normalized compression distance has further been introduced in this comparison experiment. Considering the popular used CNN method, two latest works M-CNN [28] and Lanzcos-CNN [56] both have
500
been adopted for the comparison in this section. For the proposed Visual-AT method, FGSM method has been selected for AE generation and the GoogleNet Inception V3 for both AE pre-training and malware detection. In addition, the whole dataset of MS BIG database is also exploited in this comparison, which contains the pre-processed and transformed malware samples. Finally, all of these
505
selected samples will respectively be separated into both 5-fold cross-validation and 10-fold cross-validation to detect the accuracy through data partitioning. After the implementation of three group experiments for each cross-validation testing with the same setting, one can obtain the average accuracy and com24
parison results of these four similar latest works with Visual-AT method, which 510
are illustrated in Table 4. As shown in Table 4, these methods obtain a high accuracy, in which the maximum average accuracy of both M-CNN method and Visual-AT is beyond 98%. Comparing these five methods, one can find that the CNN-based detection methods (such as M-CNN, Lanzcos-CNN, and Visual-AT) generally obtain a higher accuracy than the others based on the
515
algorithm structure with non-neural network (such as, RF and LZJD with kNN). On the contrary, for the methods based on tree structure or nearest neighbor method with optimal distance, the detection accuracy is slightly lower than the CNN-based ones to some extent. Meanwhile, through comparing the difference in accuracy results between 5-fold cross validation and 10-fold cross validation,
520
the M-CNN, Random Forest, and Visual-AT achieves high stability with the deviations 0.82%, 0.70% and 0.65% respectively. In other words, the robustness of these methods can be well applied to real-world detection. Therefore, from the analysis above, even though the proposed method (Visual-AT) is not the most accurate method, considering the experiment results from the comparison
525
in Table 4 the proposed method shows a good robustness and high accuracy in detection. 4.5. Computational Cost Results As an ML-based method, the computational cost for the proposed method mainly depends on the AE generation process and the training process of the Table 4: The comparison of different methods for malware detection
Method
5-fold Accuracy
10-fold Accuracy
Random Forest (2016) [33]
94.43%
95.13%
LZJD with kNN (2017) [55]
95.47%
96.78%
M-CNN (2018) [28]
97.23%
98.05%
Lanzcos-CNN (2019) [56]
96.72%
95.61%
Visual-AT (ours)
98.21%
97.56%
25
530
detection model, where the modeling process and training process generally account for the majority of computational cost. In general, the computational cost of the detection model’s training process is always consistent with other corresponding detection methods. In this work for Visual-AT, the framework of GoogleNet Inception V3 has been selected for training and modeling process in
535
an experiment. Therefore, in this subsection, the experiment detailed discusses the increased computational overhead in AE generation, especially the model pre-training process. In order to clearly represent the computational cost, both the BIG 2015 dataset (with 11,878 samples) and the Ember 2018 dataset (with 5,000 samples)
540
are exploited to implement the model pre-training with different iteration times in AE generation. The testbed features a CPU at 2.10GHz, 128GB RAM, and 16GB GPU. At first, the Ember dataset with 5,000 samples is used in the experiment to execute the model training process with different iteration times. In this experiment, in order to show the different overhead values of computational cost,
545
a different value of training iteration has been chosen with an order of magnitude gap, e.g., 100 and 1,000. In Table 5, the computational results are listed in detail, including the Volatile GPU-Util, GPU memory usage, Power: Usage/Cap, GPU fan, and time consumption. Among them, these results of overhead are average values obtained from three groups of experiments, which could represent
550
the general attributes of Visual-AT. Moreover, considering the factors of sample size and different dataset, the BIG dataset with 11,878 samples has also been implemented in the model training process with different iteration. One can see Table 5: Computational cost results for model training of Visual-AT
Sample Size Training Iteration
Volatile
GPU
GPU-Util(%)
Memory Usage
Pwr: Usage/Cap (W)
GPU
Time Consumption
Fan(%)
(s)
5000
100
39%
15631MiB/16273MiB
54W/180W
26%
48.92s
5000
1000
42%
15631MiB/16273MiB
58W/180W
36%
467.54s
11878
100
32%
15631MiB/16273MiB
57W/180W
27%
91.71s
11878
1000
43%
15631MiB/16273MiB
69W/180W
37%
903.11s
26
the computational results in Table 5 as well. Even though the sample size is over ten thousand with one thousand the iteration, the computational costs do 555
not consume too much, such as 43% for Volatile GPU-Util and 69W/180W for power overhead. Generally, 100 iterations can satisfy the demand to construct a pre-training model. As it listed in Table 5, for the group with 11878 samples and 100 iterations, the computational cost in it just takes 32% Volatile GPU-Util and 91.71 seconds for the whole the training process. Consequently, comparing
560
with each experiment groups in Table 5, the computational cost of the proposed method is in an acceptable range. Furthermore, considering the whole overhead of Visual-AT, this experiment also calculates the computational cost of AE generation process with two kinds of generation methods (including FGSM and C&W’s attack) after model pre-
565
training, even though the pre-training model occupies the main cost in the Visual-AT. In this experiment, we exploit the same distortion parameters (as ε=0.5) to generate AEs by a pre-trained detection model. Table 6 lists the computational cost results of AE generation with the different two methods as well. The results include the values of a CPU Utilization (CPU percent), Memory
570
Usage, and consumption time, which are mean values obtained from executing three groups of the same experiments. As FGSM based method, it takes 34.5% in CPU Utilization, 4.4% in Memory Usage (as 5.648E + 09/1.288E + 11), and 0.220 seconds for consumption time on average. Besides, for C&W’s attack based method, the average computational cost is also at a lower level, where the CPU
575
Utilization is 35.7% with 4.5% of Memory Usage (as 5.659E + 09/1.288E + 11) and 9.603 seconds for consumption time. Meanwhile, comparing the evaluation metrics with these two methods in Table 6, the average calculation efficiency Table 6: Computational cost results for different AE generation process
AE generation Method
CPU Utilization (%) Memory Usage (%) Consumption Time (s)
FGSM-based method
34.5%
4.4%
0.220s
C&W’s attack based method
35.7%
4.5%
9.603s
27
of the FGSM based method is slightly better than the C&W’s attack based method, especially for the consumption time of 0.220s. Therefore, one can find 580
that although the visual-AT, an ML-based detection method, will result in some additional computational overhead, the evaluation metrics of these experimental results are still in a reasonable range.
5. Discussion In this section, this work discusses some possibilities for the adversarial 585
technique to enhance the ML-based visualization malware detectors and provide some defensive strategies to effectively defend against AE attacks. First of all, according to the results illustrated above, one can find that the detection methods based on SVM are more robust against AE attacks from Table 2. For instance, when some AEs exist in the testing dataset the
590
average accuracy in the SVM-based detector is generally higher than one of CNN-based detector. In general, since the generation of AEs is based on the DNN with a hierarchical structure, we can make a reasonable assumption that the algorithm structure can influence the detection accuracy according to the experiment results in Sect. 4. By comparing the linear structure-based SVM
595
detector with the hierarchical structure-based CNN detector, the former achieves better accuracy and is more robust than the latter. Therefore, one can attribute this phenomenon to the decision feature of a linear algorithm and the application of kernel function. Although the CNN-based deep learning method (including the Visual-AT) is vulnerable to the attack from malware AEs, the detection
600
accuracy could be improved through the adversarial training with the enlarged dataset and a suitable regularization in the pre-training process. At present, there is no published work to validate that the structure of different algorithm will directly have a large influence on the accuracy of the detection model with AEs. So one can be able to infer that the similar hierarchical discriminant
605
structure of their algorithms between AE generation and malware detection is an important factor to induce the misjudgment of malware detection.
28
Secondly, according to the difference of malware samples from different datasets, such as BIG 2015 and Ember 2018, this research also analyzes the potential threat from different data samples. In this subsection, the mean 610
distance Dx∗ is compared with the difference between the original inputs (x) and normal AE samples (i.e., x∗ to simulate many different types of benign or malware sample), as well as the mean distance Dxt between the original inputs (x) and the corresponding simulated pseudo benign AEs (i.e., malware AEs xt to simulate the targeted benign sample). The calculated results find that both Dx∗
615
and Dxt have similar values with the proportion of 1:1.17, respectively. All these AEs (x∗ ) generated for obstructing malware detectors are very similar to the pseudo-benign AEs (xt ) and the mathematical distribution of these interference values is also close to the uniform distribution. Therefore, it is easy for these targeted AEs (xt ) to induce the detection result of the detectors. In practice,
620
it is rather difficult to distinguish normal samples from different AEs. Since the AE-generation process fits along the optimal gradient in the target category, whatever the target label is (even irrelevant), the detector will finally be induced to produce the result desired by the attackers. From this discussion, it’s not difficult to discover that the difference between samples from different datasets
625
will not influence the final detection results. Thirdly, the Visual-AT method can simulate malware camouflage and variants. Different from the normal dataset augmentation, AEs do not naturally appear in the training set. Therefore, AT can be used to perfect the robustness of detection models and avoid their flaws revealed by the attackers. Fig. 4 and 5 indicate that
630
the accuracy and detection ability of these detectors will be improved with the rise of AE’s scale and the intensity of the distortion parameters. However, the accuracy is not unlimitedly growing. When the number of AEs or the intensity of distortion reaches a certain value, the accuracies will not rise continuously, which present to be leveled off or even decreased by degrees. In general, according
635
to the variant features of size scale and distortion values in these experiments above, the phenomenon can be attributed to the over-fitting issue of detection model with an exceeding value of training parameters. Therefore, it is of great 29
importance to choose a suitable size of the sample set and a reasonable value of the distortion intensity for a specific ML-based malware detector of the Visual-AT 640
method. In summary, according to the above analysis with adversarial technique, the following three reasons can be inferred for improving the accuracy and robustness of malware detection. Firstly, the algorithmic structure of different detectors, such as the linear or hierarchical optimization method. Secondly, the similarity
645
of the difference between AEs. Thirdly, the parameter factors, such as the distribution features of distortion (ε) and AE’s dataset scale (n).
6. Conclusion Malwares are increasingly posing a serious security threat to the computer industry, especially AI and IoT. As far as we know, this paper proposes the first 650
generic ML-based visualization method to detect malwares and their variants with a universal binary format, named Visual-AT. In addition, it employs the AT technique to detect and analyze the originally difficult-to-identified malware as well as potential variants with the transformed image data through two ML algorithms. Experimental results demonstrate that the accuracy and robustness
655
of the proposed method have been greatly improved versus the existing ML-based malware detectors. Finally, the Visual-AT achieves an up-to 97.73% accuracy and 96.56% on average for malware cases tested. Compared with some latest similar methods and the traditional ML-based visualization detection methods, Visual-AT obtains 28.14% detection-accuracy increase and 81.17% false positive
660
rate reduction on average. In future work, since the AEs are trained per classification problem with a horizontal set of labels, an interesting topic will be to develop the Visual-AT for new scenarios, such as the hierarchical set of labels (e.g. Animal-Dog-Poodle). Moreover, for other hot topics, the Visual-AT can be applied to different fields,
665
such as sound and speech processing to help one with recognition impairment. Furthermore, how to optimize the Visual-AT method to extend its generalization
30
to all possible types of malware attacks is also a challenging topic that is worth studying, such as the defensive of the specific attack, Distributed Denial of Service (DDoS). Finally, one can believe that the Visual-AT method could have 670
a wide range of applications in machine learning and computer security.
Acknowledgment This work is supported by the National Natural Science Foundation of China under Grant (No.61874042, No. 61472125, and No. 61602107), the Hu-Xiang Youth Talent Program under Grant No. 2018RS3041, the Key Research and 675
Development Program of Hunan Province under Grant No. 2019GK2082, and the Peng Cheng Laboratory Project of Guangdong Province PCL2018KP004.
References References [1] A. Makandar, A. Patrot, Malware class recognition using image processing 680
techniques, in: Analytics and Innovation (ICDMAI), 2017 International Conference on Data Management, IEEE, 2017, pp. 76–80. [2] E. Gandotra, D. Bansal, S. Sofat, Malware analysis and classification: A survey, Journal of Information Security 5 (02) (2014) 56. [3] A. Makandar, A. Patrot, Overview of malware analysis and detection,
685
in: IJCA proceedings on national conference on knowledge, innovation in technology and engineering, NCKITE, Vol. 1, 2015, pp. 35–40. [4] L. Nataraj, S. Karthikeyan, G. Jacob, B. Manjunath, Malware images: visualization and automatic classification, in: Proceedings of the 8th international symposium on visualization for cyber security, ACM, 2011,
690
p. 4.
31
[5] K. S. Han, J. H. Lim, B. Kang, E. G. Im, Malware analysis using visualized images and entropy graphs, International Journal of Information Security 14 (1) (2015) 1–14. [6] T. H.-D. Huang, C.-M. Yu, H.-Y. Kao, R2-d2: Color-inspired convolutional 695
neural network (cnn)-based android malware detections, arXiv preprint arXiv:1705.04448. [7] K. Grosse, N. Papernot, P. Manoharan, M. Backes, P. McDaniel, Adversarial examples for malware detection, in: European Symposium on Research in Computer Security, Springer, 2017, pp. 62–79.
700
[8] X. Yuan, P. He, Q. Zhu, R. R. Bhat, X. Li, Adversarial examples: Attacks and defenses for deep learning, arXiv preprint arXiv:1712.07107. [9] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, R. Fergus, Intriguing properties of neural networks, arXiv preprint arXiv:1312.6199.
705
[10] X. Liu, J. Zhang, Y. Lin, H. Li, ATMPA: attacking machine learningbased malware visualization detection methods via adversarial examples, in: Proceedings of the International Symposium on Quality of Service, IWQoS 2019, Phoenix, AZ, USA, June 24-25, 2019., 2019, pp. 38:1–38:10. [11] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,
710
A. Courville, Y. Bengio, Generative adversarial nets, in: Advances in neural information processing systems, 2014, pp. 2672–2680. [12] D. Maiorca, B. Biggio, M. E. Chiappe, G. Giacinto, Adversarial detection of flash malware: Limitations and open issues, arXiv preprint arXiv:1710.10225.
715
[13] J.-Y. Kim, S.-J. Bu, S.-B. Cho, Malware detection using deep transferred generative adversarial networks, in: International Conference on Neural Information Processing, Springer, 2017, pp. 556–564.
32
[14] B. Kolosnjaji, A. Zarras, G. Webster, C. Eckert, Deep learning for classification of malware system call sequences, in: Australasian Joint Conference 720
on Artificial Intelligence, Springer, 2016, pp. 137–149. [15] D. Ucci, L. Aniello, R. Baldoni, Survey of machine learning techniques for malware analysis, Computers & Security. [16] B. Devyani, B. Poonam, Malware classification and machine learning: A survey, International Journal of Latest Research in Engineering and Technology
725
(IJLRET) 2 (10) (2016) 5. [17] R. Pascanu, J. W. Stokes, H. Sanossian, M. Marinescu, A. Thomas, Malware classification with recurrent networks, in: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2015, pp. 1916–1920.
730
[18] S. Tobiyama, Y. Yamaguchi, H. Shimada, T. Ikuse, T. Yagi, Malware detection with deep neural network using process behavior, in: 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), Vol. 2, IEEE, 2016, pp. 577–582. [19] M. Schuster, K. K. Paliwal, Bidirectional recurrent neural networks, IEEE
735
Transactions on Signal Processing 45 (11) (1997) 2673–2681. [20] D. Kong, G. Yan, Discriminant malware distance learning on structural information for automated malware classification, in: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2013, pp. 1357–1365.
740
[21] H. Xiao, T. Stibor, A supervised topic transition model for detecting malicious system call sequences, in: Workshop on Knowledge Discovery, 2011. [22] B. Kolosnjaji, A. Zarras, T. Lengyel, G. Webster, C. Eckert, Adaptive semantics-aware malware classification, in: International Conference on
745
Detection of Intrusions & Malware, 2016. 33
[23] X. Han, C. Eckert, Efficient online sequence prediction with side information, in: IEEE International Conference on Data Mining, 2013. [24] J. Pfoh, C. Schneider, C. Eckert, Leveraging String Kernels for Malware Detection, 2013. 750
[25] A. Mohaisen, O. Alrawi, M. Mohaisen, Amal: High-fidelity, behavior-based automated malware analysis and classification, Computers & Security 52 (2015) 251–266. [26] P. Parmuval, M. Hasan, S. Patel, Malware family detection approach using image processing techniques: Visualization technique, International Journal
755
of Computer Applications Technology and Research 07 (2018) 129–132. doi:10.7753/IJCATR0703.1004. [27] L. Wang, J. Liu, X. Chen, Microsoft malware classification challenge (big 2015) first place team: Say no to overfitting (2015) (2015). [28] M. Kalash, M. Rochan, N. Mohammed, N. D. B. Bruce, Y. Wang, F. Iqbal,
760
Malware classification with deep convolutional neural networks, in: Ifip International Conference on New Technologies, Mobility and Security, 2018, pp. 1–5. [29] J. Zhang, Z. Qin, H. Yin, L. Ou, K. Zhang, A feature-hybrid malware variants detection using cnn based opcode embedding and bpnn based api
765
embedding, Computers & Security 84 (2019) 376–392. [30] K. Kancherla, S. Mukkamala, Image visualization based malware detection., in: Computational Intelligence in Cyber Security, 2013. [31] L. Liu, B. Wang, Malware classification using gray-scale images and ensemble learning, in: International Conference on Systems and Informatics, 2017,
770
pp. 1018–1022. [32] S. Pai, F. Di Troia, C. A. Visaggio, T. H. Austin, M. Stamp, Clustering for malware classification, Journal of Computer Virology and Hacking Techniques 13 (2) (2017) 95–107. 34
[33] F. C. C. Garcia, I. F. P. Muga, Random forest for malware classification. 775
[34] K. Kosmidis, C. Kalloniatis, Machine learning and images for malware detection and classification, in: Proceedings of the 21st Pan-Hellenic Conference on Informatics, ACM, 2017, p. 5. [35] S. A. Mohd, M. S. Bin, A. M. Mohd, Classification of malware family using decision tree algorithm, in: Innovations in Computing Technology
780
and Applications, Vol. 02, 2017, pp. 1–8. [36] D. Lee, I. S. Song, K. J. Kim, J.-h. Jeong, A study on malicious codes pattern analysis using visualization, in: Information Science and Applications (ICISA), 2011 International Conference on, IEEE, 2011, pp. 1–5. [37] X. Liu, Y. Lin, H. Li, J. Zhang, Adversarial examples: Attacks on machine
785
learning-based malware visualization detection methods, arXiv preprint arXiv:1808.01546. [38] J. Zhang, C. Li, Adversarial examples: Opportunities and challenges, IEEE Transactions on Neural Networks and Learning Systemsdoi:10.1109/ TNNLS.2019.2933524.
790
[39] I. J. Goodfellow, J. Shlens, C. Szegedy, Explaining and harnessing adversarial examples, arXiv preprint arXiv:1412.6572. [40] F. Tram`er, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, P. McDaniel, Ensemble adversarial training: Attacks and defenses, arXiv preprint arXiv:1705.07204.
795
[41] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, A. Vladu, Towards deep learning models resistant to adversarial attacks, arXiv preprint arXiv:1706.06083. [42] T. P. Bohlin, Practical grey-box process identification: theory and applications, Springer Science & Business Media, 2006.
35
800
[43] N. Akhtar, A. Mian, Threat of adversarial attacks on deep learning in computer vision: A survey, arXiv preprint arXiv:1801.00553. [44] J. Aldrich, J. Miller, Earliest known uses of some of the words of mathematics, particular, the entries for” bell-shaped and bell curve”,” normal (distribution)”,” Gaussian”, and” Error, law of error, theory of errors, etc.
805
[45] N. Carlini, D. Wagner, Towards evaluating the robustness of neural networks, in: Security and Privacy (SP), 2017 IEEE Symposium on, IEEE, 2017, pp. 39–57. [46] T. Miyato, S.-i. Maeda, S. Ishii, M. Koyama, Virtual adversarial training: a regularization method for supervised and semi-supervised learning, IEEE
810
transactions on pattern analysis and machine intelligence. [47] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional neural networks, in: Advances in neural information processing systems, 2012, pp. 1097–1105. [48] G. Lan, Y. Zhou, Conditional gradient sliding for convex optimization,
815
SIAM Journal on Optimization 26 (2) (2016) 1379–1409. [49] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition (IEEE CVPR 2016), 2016, pp. 770–778. [50] Y. Bengio, et al., Learning deep architectures for ai, Foundations and
820
R in Machine Learning 2 (1) (2009) 1–127. trends
[51] S. Bubeck, et al., Convex optimization: Algorithms and complexity, FounR in Machine Learning 8 (3-4) (2015) 231–357. dations and Trends
[52] Virustotal tool, https://www.virustotal.com/en/, 2018. [53] H. S. Anderson, P. Roth, Ember: an open dataset for training static pe 825
malware machine learning models, arXiv preprint arXiv:1804.04637.
36
[54] F. A. Narudin, A. Feizollah, N. B. Anuar, A. Gani, Evaluation of machine learning classifiers for mobile malware detection, Soft Computing 20 (1) (2016) 343–357. [55] E. Raff, C. Nicholas, An alternative to ncd for large sequences, lempel-ziv 830
jaccard distance, in: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2017, pp. 1007–1015. [56] D. Gibert, C. Mateu, J. Planes, R. Vicens, Using convolutional neural networks for classification of malware represented as images, Journal of
835
Computer Virology and Hacking Techniques 15 (1) (2019) 15–28.
37
Author Biography: Xinbo Liu Received his M.S. degree in Analytical Science from Central South University, China, in 2015. Since 2015, he has been a Ph.D. candidate in College of Computer Science and Electronic Engineering, Hunan University. His research interests include Machine Learning, Artificial Neural Network, Information Security, Data Mining and Software Security.
Yaping Lin
(Corresponding author) Received the B.S. degree in Computer Application from Hunan University, China, in 1982, and the M.S. degree in Computer Application from National University of Defense Technology, China in 1985. He received the Ph.D. degree in Control Theory and Application from Hunan University in 2000. He has been a professor and Ph.D supervisor in Hunan University since 1996. During 2004-2005, he worked as a visiting researcher at the University of Texas at Arlington. His research interests include machine learning, artificial intelligent, network security and wireless sensor networks.
Jiliang Zhang Received the Ph.D. degree in Computer Science and Technology from Hunan University, Changsha, China in 2015. From 2013 to 2014, he worked as a Research Scholar at the Maryland Embedded Systems and Hardware Security Lab, University of Maryland, College Park. From 2015 to 2017, he was an Associate Professor with Northeastern University, China. Since 2017, he has joined Hunan University. His current research interests include hardware/hardware-assisted security, artificial intelligence security, and emerging technologies. Prof. Zhang was a recipient of the Hu-Xiang Youth Talent, and the best paper nominations in International Symposium on Quality Electronic
Design 2017. He has been serving on the technical program committees of many international conferences such as ASP-DAC, FPT, GLSVLSI, ISQED and AsianHOST, and is a Guest Editor of the Journal of Information Security and Applications and Journal of Low Power Electronics and Applications.
He Li Received the M.S degree from the department of Microelectronics in Tianjin University, China. He is currently a Ph.D. student of department of Electrical and Electronics Engineering in Imperial College London, UK. And also he is a visiting student in Hunan University. His research interests include custom computing, computer arithmetic, hardware security and machine learning. He has received the Best Paper Presentation award in 2017 IEEE international conference on Field-programmable Technology (FPT), the Student Travel award in 2017 IEE symposium on Computer Arithmetic (ARITH’24) and the outreach award in 2015 IEEE international system-on-chip conference (SOCC).
A CONFLICT OF INTEREST STATEMENT • None of the authors of this paper has a financial or personal relationship with other people or organizations that could inappropriately influence or bias the content of the paper. • It is to specifically state that “No Competing interests are at stake and there is No Conflict of Interest” with other people or organizations that could inappropriately influence or bias the content of the paper.