Malware Image Classification Using One-Shot Learning with Siamese Networks

Malware Image Classification Using One-Shot Learning with Siamese Networks

Available at www.sciencedirect.com Available online online at www.sciencedirect.com Available at www.sciencedirect.com Available online online at www...

753KB Sizes 0 Downloads 48 Views

Available at www.sciencedirect.com Available online online at www.sciencedirect.com Available at www.sciencedirect.com Available online online at www.sciencedirect.com Available online at www.sciencedirect.com

ScienceDirect ScienceDirect ScienceDirect Procedia Computer Science 105 (2017) i

Procedia Computer Science 00 (2019) 000–000 Procedia Computer Science 105(2019) (2017) i 000–000 Procedia Computer Science 00 (2019) Procedia Computer Science 159 1863–1871

www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia

23rd International Conference on Knowledge-Based and Intelligent Information & Engineering 2016 International Symposium Robotics and Intelli23rd IEEE International Conference on Knowledge-Based Intelligent Information & Engineering Systems and on 2016 IEEE International Symposium on Robotics and IntelliSystems gent Sensors, IRIS 2016, 17–20 December 2016, Tokyo, Japan gent Malware Sensors,Image IRIS 2016, 17–20 December 2016,Learning Tokyo, with Japan Classification Using One-Shot

Malware Image Classification Using One-Shot Learning with Editorial Board: Siamese Editorial Networks Board: Siamese Networks b a a,∗ Yussof Shou-Ching Hsiaoaa ,Hanafi Da-Yu ah Kao b , Zi-Yuan Liua , Raylin Tsoa,∗ Hanafi ah Yussof Shou-Ching Hsiao , Da-Yu Kao , Zi-Yuan Liu , Raylin Tso a Department of Computer Science, National Chengchi University, Taipei, Taiwan of ComputerManagement, Science, National Chengchi Taiwan Information Central Police University, University, Taipei, Taoyuan, Taiwan Information Management, Central Police University, Taoyuan, Taiwan

a Department b Department of b Department of

Abstract Abstract Machine learning has largely applied to malware detection and classification, due to the ineffectiveness of signature-based method Machine learning has largely appliedAlthough to malware detection and machine classification, duemodels to the ineffectiveness signature-based method toward rapid malware proliferation. state-of-the-art learning tend to achieveofhigh performances, they toward rapid malware proliferation. Although state-of-the-art machine learning models tend to achieve high performances, they require a large number of training samples. It is infeasible to train machine learning models with sufficient malware samples while require a largeappeared number malware of training samples. It is infeasible to train machine learning models malware samples facing newly variants. Therefore, it is important for security protectors to with train sufficient a model given a small set ofwhile data, facing appeared malware variants. Therefore, it is important for security protectors to train a model givenkeep a small set of data, which newly can identify malware variants based on the similarity function. In addition, security protectors should re-training the which can identify malware variants based on the similarity function. In addition, security protectors should keep re-training the models on newly-found samples, while the typical machine learning models based on massive data are not efficient for the instant models on newly-found samples, while the typical machine learning models based on massive data are not efficient for the instant update. Inspired by recent success using Siamese neural networks for one-shot image recognition, we aim to apply the networks update. Inspired recent success using neural networks one-shot image pre-processing, recognition, wetraining, aim to apply the networks to malware imagebyclassification task. TheSiamese implementation includes for three main stages: and testing. In the to malware image classification task. The implementation includes three main stages: pre-processing, training, and testing. the pre-processing stage, the system transforms malware samples to the resized gray-scale images and classifies them by averageInhash pre-processing stage, the system transforms malware samples to the resized gray-scale images and classifies them by average hash in the same family. In the training and testing stages, Siamese networks are trained to rank similarity between samples and the in the same family. In through the training andone-shot testing tasks. stages,The Siamese networks areshowed trained that to rank between samples and the accuracy is calculated N-way experiment results our similarity networks outperformed the baseline accuracy is calculated through N-way one-shot tasks. The experiment results showed that our networks outperformed the baseline methods. Besides, this paper indicated that our networks were more suitable for malware image one-shot learning than typical deep methods. Besides, this paper indicated that our networks were more suitable for malware image one-shot learning than typical deep learning models. learning models. c 2019  2019 The The Authors. Author(s). Published ElsevierB.V. B.V. © Published byby Elsevier c 2019  The Author(s). Published Elsevier B.V. This is an open access article underbythe CC BY-NC-ND BY-NC-ND license license (https://creativecommons.org/licenses/by-nc-nd/4.0/) (https://creativecommons.org/licenses/by-nc-nd/4.0/) This is an open access article under the CC BY-NC-ND Peer-review under under responsibility responsibility of of KES KES International. International. license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Peer-review under responsibility of KES International. Keywords: Malware Image Classification; One-Shot Learning; Siamese Convolutional Neural Networks; Average Hash Keywords: Malware Image Classification; One-Shot Learning; Siamese Convolutional Neural Networks; Average Hash

1. Introduction 1. Introduction With the rapid proliferation of malware variants, it has posed a severe threat for individuals and enterprises alike. the rapid proliferation malware variants, it has severe threat for individuals and enterprises alike. TheWith malware variants modified of from previous versions willposed evadea detection through different signature using sophisThe malware variants modified from previous versions will evade detection through different signature using sophisticated reproduction techniques [19]. To align detection with the malware proliferation speed, the machine-learning ticated reproduction techniques [19]. To align detection with the malware proliferation speed, the machine-learning ∗ ∗

Corresponding author. Tel.: +886-02-29393091 #62328 Corresponding Tel.: +886-02-29393091 #62328 E-mail address:author. [email protected] E-mail address: [email protected]

c 2019 The Author(s). Published by Elsevier B.V. 1877-0509  c 2019 1877-0509  The Author(s). Published by Elsevierlicense B.V. (https://creativecommons.org/licenses/by-nc-nd/4.0/) This is anAmsterdam open access under the CC BY-NC-ND - Boston - London - NewB.V. York - Oxford - Paris - Philadelphia - San Diego - St Louis 1877-0509 © 2019 Thearticle Authors. Published by Elsevier This is an open access article under the CC BY-NC-ND (https://creativecommons.org/licenses/by-nc-nd/4.0/) Amsterdam -article Boston - London - Newlicense Yorklicense - Oxford - Paris - Philadelphia - San Diego - St Louis Peer-review under responsibility of KES International. This is an open access under the CC BY-NC-ND (https://creativecommons.org/licenses/by-nc-nd/4.0/) Peer-review under responsibility of KES International. Peer-review under responsibility of KES International. 10.1016/j.procs.2019.09.358

1864

Shou-Ching Hsiao et al. / Procedia Computer Science 159 (2019) 1863–1871 S.C. Hsiao et al. / Procedia Computer Science 00 (2019) 000–000

based detection methods have been largely discussed in recent years. Many papers have proved a high performance of the learning results using either supervised machine learning model or deep learning model [15]. They require lots of training samples and a long learning period in the training process. However, it is infeasible to collect sufficient samples in the early appearance of new malware variants. In addition, an essential requirement for constructing sustainable learning models is to train on a wide variety of malware samples [10]. If any new malware appears, the model needs to be re-trained on the whole large dataset. Therefore, the cost of a massive sample collection and periodical re-training is too large. This paper adopts one-shot learning with Siamese networks to solve the problem. One-shot learning refers to the practice of training model with a small set of data but can successfully avoid overfitting. The concept is generated from humans competent for learning things from only a few examples. Apparently, if the standard machine learning method is trained with only a few samples, it will suffer from severe overfitting. In comparison, our method is based on Siamese networks which build the architecture with twin convolutional neural networks and share the same parameters. Empirical data has shown that one-shot image recognition using convolutional Siamese networks was capable of achieving a test accuracy of 92.0% using the Omniglot dataset [13, 14]. In our system, the malware samples are transformed into image representation through a series of pre-processing steps and fed into the Siamese convolutional neural networks. The similarity score generated by the output sigmoid layer signifies the malware family determination, which is the basic concept to adopt the Siamese networks. In summary, the main contributions are listed below: • Present a method to detect different types of malware variants toward malware proliferation using malware image classification. • Address the problem of lacking training samples in the early stage of new malware appearance by applying one-shot learning with Siamese networks. • Our method achieves higher accuracy than the baseline methods in malware image one-shot classification tasks. • Experiment our dataset on one of the typical deep learning models and the testing accuracy shows that the model does not fit well with scarce training samples. The remainder of this paper is organized as follows. Section 2 reviews state-of-the-art methods for malware image classification, one-shot learning, and Siamese networks. Section 3 describes the methodology of one-shot malware image classification system. Section 4 presents the experiments and results. The last section concludes the paper and suggests ideas for future work. 2. Related Work 2.1. State-of-the-art Methods for Malware Image Classification The first method using binary-to-image processing techniques for malware classification was proposed in [17]. The paper used the k-Nearest Neighbor (k-NN) method and calculated the Euclidean distance to classify the malware image. In [16], the authors conducted a comparative analysis of malware classification using image-based binary texture analysis with that of dynamic analysis. They proved that the proposed method using malware image has higher accuracy than dynamic analysis in malware classification. Agarap and Pepito [2] followed the method in [17, 16] to generate the malware image dataset and explores the DL-SVM, an architecture combining Linear Support Vector Machine (SVM) and different deep learning models for image classification. Three kinds of DL-SVM architectures were implemented in the paper: CNN-SVM, GRU-SVM, and MLP-SVM, and the experiments showed that accuracy can reach 77.23%, 84.92% and 80.47% respectively. While state-of-the-art methods for malware image classification have addressed the problem of ineffective traditional signature-based malware detectors, it is under the premise of requirements for massive training data. 2.2. One-shot Learning Whereas typical machine learning algorithms need training on the huge dataset, one-shot learning algorithms target training from scarce samples. The initial stage of one-shot learning work was introduced in the 2000’s [5, 6]. The paper



Shou-Ching Hsiao et al. / Procedia Computer Science 159 (2019) 1863–1871 S.C. Hsiao et al. / Procedia Computer Science 00 (2019) 000–000

1865

Fig. 1. High-level system overview.

adopted a variational Bayesian framework for one-shot learning on the image classification task. The authors showed that a small set of sample data can be used to train a model for predicting future images. Hierarchical Bayesian Program Learning (HBPL) was then proposed after that to solve one-shot learning for recognizing handwritten characters using the principles of compositionality and causality. In [23], the authors demonstrated the memory augmented neural network (MANN) modified from the Neural Turing Machines (NTMs) [7], which can immediately learn from new samples and make accurate predictions. MANN keeps relevant information obtained from a few samples via accessing a content-based external memory. Matching networks was inspired by several works like memory augmented neural networks and metric learning [24]. The paper ran different one-shot experiments, including Omniglot [14], ImageNet [21] and one language modeling (Penn Treebank). It specified new one-shot tasks of ImageNet on a reduced version for rapid experimentation. Siamese networks were first presented by Bromley and LeCun [4] in 1993 to solve signature verification written on a pen-input tablet. The authors used two pieces of the same network to extract feature vectors and determine the similarity of the two input samples. In recent years, learning Siamese neural networks has become popular and commonly applied to one-shot learning tasks. 2.3. Siamese Networks The main concept of Siamese networks is to train the twin networks through a supervised method. Siamese neural networks were applied to conduct one-shot classification on the Omniglot [14] dataset in order to identify whether two input images were in the same category [13]. The paper shows an inspiring accuracy of up to 92.0% once optimizing the said Siamese neural networks. In addition, it demonstrated the learned features at one-shot learning capable to discriminate between different characters and aggregate for the same category. DeepFace system is a multi-stage approach for face recognition proposed in [26]. One of the experiments trained the Siamese network using standard cross entropy loss and backpropagation of the error. It predicted the similarity by L1-distance between two extracted features and determines whether two faces belong to the same person. The accuracy of the DeepFace-Siamese networks reaches about 96.17%. To provide the protection toward malware variants, the purpose is to calculate the similarity score of suspicious targets with different malware family samples. However, there is seldom one-shot literature using Siamese networks in malware protection field, and this paper hopes to amend through implementing one-shot learning with Siamese networks in the following sub-sections. 3. Methodology 3.1. System Description This paper explores the Siamese network architectures that have been recently achieved great success in the oneshot image recognition field [13], and applies the concept to malware image classification. Our system includes three main stages: dataset pre-processing, training, and testing (Fig. 1). In the pre-processing stage, malware samples are

1866

Shou-Ching Hsiao et al. / Procedia Computer Science 159 (2019) 1863–1871 S.C. Hsiao et al. / Procedia Computer Science 00 (2019) 000–000

Fig. 2. The dataset pre-processing steps.

Fig. 3. The concept of CNN convolutional layer.

transformed into the consistent form of malware images. It is noted that in many of malware machine learning tasks, the features should be extracted to train the model, but what feature vectors contain vary with different file types. We generalize the feature representation as malware image so that our system can deal with different types of malware. Convolutional neural network (CNN) is a popular deep learning method that accepts the image as input and conducts the classification based on the predictive power from large training samples. Since our goal is to address the propagation of new malware variants in the early appearance, we have only a few samples for training. This paper adopts Siamese convolutional neural networks (Siamese CNNs) that share weights between the sub-networks, which can reduce the parameters to train, and avoid the problem of overfitting. After processing the two input images with separate CNNs, the system generates two feature vectors denoted as v(m1) and v(m2). The Manhattan distance between v(m1) and v(m2) will be the input of the sigmoid function. Then, the similarity score is generated in the range of [0,1], where 0 represents no similarity and 1 means the full similarity.

3.2. Dataset Pre-processing Before feeding the samples into the training newtorks, the system conducts a series of dataset pre-processing steps illustrated in Fig. 2. Firstly, malware samples are transformed into the 8-bit vector images using the method presented in [17], where the system maps each malware sample to a matrix that can be visualized as a grayscale image. The values of 8-bit vector are in the range of [0, 255], in which 0 represents black and 1 represents white [2]. Since the malware samples have different file lengths, they are rescaled into the 105 × 105 pixels, keeping the original aspect ratio and filling the background with black. For re-sampling, the system uses bicubic interpolation. After generating consistent size of malware images, the next step is to classify the image dataset in more detail. Due to different reproduction methods, the variance between malware samples in the same family depends on several features, such as different file types, attack vectors, or obfuscation techniques. Despite being labeled as the same family, it may cause a high variance between malware samples. To address this problem, the system further classifies the image dataset within the same family using Average Hash (aHash) value [9]. Average Hash is one of the perceptual hashes which generates the fingerprint of the input image. The image with the same aHash value will be labeled as the same sub-type in our system.



Shou-Ching Hsiao et al. / Procedia Computer Science 159 (2019) 1863–1871

1867

S.C. Hsiao et al. / Procedia Computer Science 00 (2019) 000–000

Fig. 4. (a) The concept of CNN max pooling layer; (b) The concept of CNN fully connected layer.

Fig. 5. The structure of Siamese convolutional neural networks [13].

3.3. Training Model 3.3.1. Siamese Convolutional Neural Networks Convolutional neural network (CNN) is mainly composed of the convolutional layer (Fig. 3), pooling layer (Fig. 4 (a)), and fully connected layer (Fig. 4 (b)). This paper adopts Siamese convolutional neural networks to implement twin CNNs which share the same weights and parameters. The architecture of Siamese networks implemented in our experiment is shown in Fig. 5. This paper follows the structure in [13] to construct the Siamese CNNs, which has been empirically verified as the best convolutional architecture. The sequence of convolutional layers in twin CNNs use filters of different size and fixed stride one to extract the feature maps. The output of each convolution in the first three layers is fed into the Rectified Linear Unit (ReLU) activation function and max-pooling operation with stride two. The final convolutional layer is followed by a fully-connected layer with sigmoid activation function, where the feature maps are flattened into a single vector. A standard image classification task will process the image through a series of layers in CNNs (Fig. 3, 4) and the result is presented by a probability distribution over all the classes. In contrast to typical CNNs used in standard classification, one additional customized layer is added to calculate the Manhattan distance (L1 distance) between two extracted feature vectors. And then the result is passed through a fully connected layer with the sigmoid function and units set to one. The final output signifies the similarity score of the malware image pair in the range of [0, 1]. 3.3.2. Creating Mini-Batch of Image Pairs for Training The training for Siamese CNNs is conducted per mini-batch. To make an effective train for networks, the system randomly chooses image pairs for training but avoid unbalanced numbers of similar and dissimilar image pairs in a mini-batch creation. The mini-batch size of images is randomly chosen as the anchor in different classes, while the pairing images are controlled as half same class and half different class. Then, the networks use adaptive moment estimation optimizer (Adam) [12] to update the weights during each mini-batch of training iteration. 3.4. Testing Process In the testing process, the dataset is composed of different classes that have not appeared in the training stage, to validate our networks. One of the advantages of testing on new dataset is to prove that the trained model has no overfitting problems. In addition, testing by new dataset can simulate the malware proliferation in a real-world

Shou-Ching Hsiao et al. / Procedia Computer Science 159 (2019) 1863–1871 S.C. Hsiao et al. / Procedia Computer Science 00 (2019) 000–000

1868

scenario. The testing process will conduct M times of N-way one-shot learning tasks, where Q times of correct predictions will contribute to the accuracy calculated by the following formula [13]: Accuracy = (100 ∗ Q/M)%

(1)

In each task of N-way one-shot learning, the system first chooses an anchor image xˆ from one class, and then N , where x1 is chosen the same class as xˆ. randomly selects N classes of images to form the support set X = {xi }i=1 ∀x ∈ X, the system calculates the similarity score to xˆ through our Siamese networks and gets the vector of similarity N . If s1 is the maximum of S , the task is labeled as a correct prediction. scores, which can be represented as S = {si }i=1 Otherwise, it is viewed as an incorrect result. Once the network has been tuned and validated, the system can take advantage of powerful prediction of the networks not just to new malware variants, but to completely new malware classes. 3.5. Baseline Methods for Comparison 3.5.1. 1-Nearest Neighbor K-nearest neighbors (k-NN) algorithm is a popular solution for one-shot learning task. As the support set is composed of a single sample from each selected class, this paper implements k-NN with k = 1 (1-nearest neighbor) to calculate the distance between each pair of input images and find the nearest one. The method of making pairs of N , the one-shot task images is the same as the processes in Siamese networks. After forming the support set X = {xi }i=1 anchor image xˆ from one class is iterated through all the support set images, and the system calculates each of the Euclidean distance (L2 distance): C( xˆ, X) = arg min  xˆ − xi . xi ∈X

(2)

The shorter the distance is, the more similar two images are determined by the nearest neighbor algorithm. Accordingly, the exact image in support set with the nearest distance will be predicted as the same class of anchor image xˆ. 3.5.2. Random Guessing Random guessing starts with randomly initializing the weights until the system is able to correctly classify all training images. The accuracy of random guessing is sensitive to the N-way value in the testing process. 4. Experiments and Results 4.1. Learning Library Keras [11] is a high-level neural networks API, written in Python. In the training and testing stages, this paper followed [3, 13] to use Keras with the TensorFlow [1] as backend, and the support of other scientific computing libraries, such as matplotlib [8], numpy [22], and scikit-learn [18]. In the dataset pre-processing stage, the system was implemented using the Python Imaging Library (PIL) [20] to conduct any transformation or classification on malware images. 4.2. The Dataset The malware samples were mainly collected from Virus Share [25]. We selected these samples to provide different malware family on the training dataset and testing dataset. The training dataset contained 35 malware families, and the testing dataset was composed of 17 malware families. According to the aHash value, each malware family was further classified into different numbers of sub-types through dataset pre-processing steps. The training dataset and the testing dataset included totally 116 and 63 sub-types respectively, and each sub-type contained three malware samples.



Shou-Ching Hsiao et al. / Procedia Computer Science 159 (2019) 1863–1871 S.C. Hsiao et al. / Procedia Computer Science 00 (2019) 000–000

1869

4.3. Evaluation Criteria The performance was evaluated by N-way one-shot learning tasks, in which a malware image was first chosen from among those reserved for the evaluation set, along with N malware images taken uniformly at random. The testing accuracy was calculated by Eq. 1. 4.4. Results The experiments in this paper were conducted on a desktop computer with Intel Core (TM) i7-8700HQ CPU @ 3.20GHz, 16GB of DDR3 RAM, and NVIDIA GeForce GTX 1060M 6GB DDR5 GPU. Table 1 summarizes the hyper-parameters used by our proposed networks in the experiments.The results are presented in the following subsections, including the accuracy of our proposed networks, comparison with baseline methods, and training other deep learning models with our dataset. Table 1. Hyper-parameters used in the Siamese networks

Hyper-parameters

Value

Description

Mini-Batch size

6

Learning rate

0.00006

N-iterations N-way N-tasks

2000 1 to 15 150

No. of image pairs propagated through the network per gradient update. One of the parameters in Adam optimizer to approach to the best solution. The other Adam parameters follow the original paper [12]. Total numbers of iterations in the training process. No. of classes chosen for testing in each one-shot task. No. of one-shot tasks to validate on.

4.4.1. The Accuracy of Siamese CNNs After saving the trained networks, we conducted the performance evaluation. The networks were fed with training dataset and testing dataset respectively, and iterated through the value of N-way from 1 to 15 (Table. 2). The results showed that our networks can perform relatively well on both the training dataset and testing dataset with smaller N-way value, while the networks were more prone to error predictions with higher N-way value. And there was still a small gap between the training dataset and testing dataset while adopting the trained networks. Table 2. The accuracy of N-way one-shot learning for different methods.

N-way TR a TE b NN c RG d

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

100.0 100.0 100.0 100.0

95.3 92.6 66.6 50.0

93.3 84.0 60.0 33.3

87.3 72.0 41.3 25.0

82.3 75.3 40.0 20.0

68.6 76.6 44.6 16.6

74.6 70.0 38.0 14.2

70.6 60.6 27.3 12.5

68.6 62.6 25.3 11.1

59.3 60.6 24.6 10.0

62.0 54.0 24.6 9.1

62.6 47.3 20.6 8.3

60.6 52.6 22.0 7.7

55.3 42.6 15.3 7.1

60.0 42.6 24.6 6.7

a

Siamese CNNs (training dataset). Siamese CNNs (testing dataset). c Nearest Neighbor. d Random Guessing. b

4.4.2. Comparison with Baseline Methods The performance of our Siamese networks was compared with other baseline methods using the same malware image dataset. The baseline methods include 1-Nearest Neighbor using L2 distance and random guessing. The com-

1870

Shou-Ching Hsiao et al. / Procedia Computer Science 159 (2019) 1863–1871 S.C. Hsiao et al. / Procedia Computer Science 00 (2019) 000–000

Fig. 6. (a) 5-way accuracy comparison; (b) 10-way accuracy comparison; (c) 15-way accuracy comparison. Plotted using matplotlib [8].

parison is evaluated through different N-way value of one-shot learning tasks. We conducted three times of experiments setting the maximum of N-way value to 5, 10, and 15 respectively and the results are plotted in Fig. 6. The results showed that regardless of N-way value, our proposed Siamese networks achieved a higher accuracy on both training and testing dataset than the baseline methods. Although the accuracy of either Siamese networks or baseline methods declined with the increase of N-way value, our networks could still bring an improvement of performance with respect to the baseline methods. 4.4.3. Training Other Deep Learning Models with Our Dataset Agarap and Pepito built an intelligent anti-malware system amending deep learning model with support vector machine (SVM) for malware classification [2]. The paper presented convincing experiment results by training 9,339 malware samples from 25 different malware families. And we attempt to figure out if the deep learning models can also reach a high accuracy by training with small data. The reason why this paper selects [2] for comparison is its openness of source code and the paper also used malware image as feature representation. Our dataset was used to train one of the deep learning models implemented in [2]: CNN-SVM. The hyper-parameters and the training process all remained the same as the experiments in [2]. The average testing accuracy of the trained model had only 29.56%, which showed a high proportion of erroneous predictions in several classes. The result implied that scarcity of training data was not suitable for typical deep learning models, for the models would suffer from severe curse of dimensionality. It is infeasible to apply typical deep learning models on one-shot learning tasks. Since the scenario of our research is to solve the malware proliferation in the early stage, our system is needed to train with rare training samples. 5. Conclusions and Future Works This paper presented the concept of malware image classification using one-shot learning with Siamese networks. Since typical deep learning models will not fit well with small sets of training samples, this paper addresses the problem by adopting Siamese CNNs. The experiments showed that our system performed better than other baseline methods and it was more suitable to train with small sets of samples by Siamese CNNs. The results of this paper presented a potential solution to early-staged malware appearance scenarios. Future work will focus on achieving better accuracy with high N-way value and extend our works to different improvements in Siamese networks. With different kinds of methods implemented in the one-shot malware classification task, a comparative analysis can be conducted to pursue the best implementation for early-staged malware protection. Acknowledgements This research was supported by the Ministry of Science and Technology, Taiwan (ROC), under Project Numbers MOST 107-2218-E-004-001, MOST 105-2221-E-004-001-MY3, MOST 107-2221-E-015-002-, and by Taiwan Information Security Center at National Sun Yat-sen University (TWISC@NSYSU).



Shou-Ching Hsiao et al. / Procedia Computer Science 159 (2019) 1863–1871 S.C. Hsiao et al. / Procedia Computer Science 00 (2019) 000–000

1871

References [1] Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al., 2016. Tensorflow: A System for Large-scale Machine Learning, in: 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16), pp. 265–283. [2] Agarap, A.F. and Pepito, F.J.H., 2017. Towards Building an Intelligent Anti-Malware System: A Deep Learning Approach using Support Vector Machine (SVM) for Malware Classification. arXiv preprint arXiv:1801.00318 . [3] Bouma, S., 2013. keras-oneshot. https://github.com/sorenbouma/keras-oneshot. Accessed: 2019-01-05. [4] Bromley, J., Guyon, I., LeCun, Y., Säckinger, Eduard, and Shah, R., 1993. Signature Verification Using a "Siamese" Time Delay Neural Network, in: Proceedings of the 6th International Conference on Neural Information Processing Systems, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. pp. 737–744. [5] Fei-Fei, L., Fergus, Rob, and Perona, P., 2003. A Bayesian Approach to Unsupervised One-Shot Learning of Object Categories, in: Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2, IEEE Computer Society, Washington, DC, USA. pp. 1134–. [6] Fei-Fei, L., Fergus, Rob, and Perona, P., 2006. One-shot Learning of Object Categories. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 594–611. [7] Graves, A., Wayne, Greg, and Danihelka, I., 2014. Neural Turing Machines. arXiv preprint arXiv:1410.5401 . [8] Hunter, J.D., 2007. Matplotlib: A 2D Graphics Environment. Computing in Science Engineering 9, 90–95. [9] Imagehash, . Imagehash. https://github.com/bjlittle/imagehash/. Accessed: 2019-02-15. [10] Jordaney, R., Sharad, K., Dash, S.K., Wang, Z., Papini, D., and Lorenzo Cavallaro, I.N., 2017. Transcend: Detecting Concept Drift in Malware Classification Models, in: 26th USENIX Security Symposium (USENIX Security 17), USENIX Association, Vancouver, BC. pp. 625–642. [11] Keras, . Keras. https://keras.io/. Accessed: 2019-01-07. [12] Kingma, Diederik P and Ba, J., 2014. Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980 . [13] Koch, G., Zemel, Richard, and Salakhutdinov, R., 2015. Siamese Neural Networks for One-shot Image Recognition. [14] Lake, B.M., Salakhutdinov, R.R., and Joshua B. Tenenbaum, J.G., 2011. One Shot Learning of Simple Visual Concepts, in: CogSci. [15] Le, Q., Boydell, O., and Mark Scanlon, B.M.N., 2018. Deep learning at the shallow end: Malware classification for non-domain experts. Digital Investigation 26, S118 – S126. [16] Nataraj, L., Yegneswaran, V., Porras, Phillip, and Zhang, J., 2011. A Comparative Assessment of Malware Classification Using Binary Texture Analysis and Dynamic Analysis, in: Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence, ACM, New York, NY, USA. pp. 21–30. [17] Nataraj, L., Karthikeyan, S., Jacob, G., and Manjunath, B. S., 2011. Malware Images: Visualization and Automatic Classification, in: Proceedings of the 8th International Symposium on Visualization for Cyber Security, ACM, New York, NY, USA. pp. 4:1–4:7. [18] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, Matthieu, and Duchesnay, E., 2011. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 12, 2825–2830. [19] Pirscoveanu, R.S., Hansen, S.S., Larsen, T.M.T., Stevanovic, M., and A. Czech, J.M.P., 2015. Analysis of Malware Behavior: Type Classification using Machine Learning, in: 2015 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), pp. 1–7. [20] python-pillow, . python-pillow. https://python-pillow.org. Accessed: 2019-01-07. [21] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al., 2015. Imagenet Large Scale Visual Recognition Challenge. International journal of computer vision 115, 211–252. [22] S. van der Walt, S.C. Colbert, and G. Varoquaux, 2011. The NumPy Array: A Structure for Efficient Numerical Computation. Computing in Science Engineering 13, 22–30. [23] Santoro, A., Bartunov, S., Botvinick, M., Wierstra, Daan, and Lillicrap, T., 2016. One-shot Learning with Memory-augmented Neural Networks. arXiv preprint arXiv:1605.06065 . [24] Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., et al., 2016. Matching Networks for One Shot Learning, in: Advances in neural information processing systems, pp. 3630–3638. [25] VirusShare, . Virusshare. https://virusshare.com/. Accessed: 2019-02-12. [26] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, 2014. DeepFace: Closing the Gap to Human-Level Performance in Face Verification, in: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708.