FAMOUS: Forensic Analysis of MObile devices Using Scoring of application permissions

Accepted Manuscript FAMOUS: Forensic Analysis of MObile Devices Using Scoring of application permissions Ajit Kumar, K.S. Kuppusamy, G. Aghila PII: D...

Download PDF

2MB Sizes 0 Downloads 24 Views

Report

PDF Reader
Full Text

Accepted Manuscript FAMOUS: Forensic Analysis of MObile Devices Using Scoring of application permissions Ajit Kumar, K.S. Kuppusamy, G. Aghila

PII: DOI: Reference:

S0167-739X(17)32325-7 https://doi.org/10.1016/j.future.2018.02.001 FUTURE 3972

To appear in:

Future Generation Computer Systems

Received date : 12 October 2017 Revised date : 4 January 2018 Accepted date : 1 February 2018 Please cite this article as: A. Kumar, K.S. Kuppusamy, G. Aghila, FAMOUS: Forensic Analysis of MObile Devices Using Scoring of application permissions, Future Generation Computer Systems (2018), https://doi.org/10.1016/j.future.2018.02.001 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

FAMOUS: Forensic Analysis of Mobile Devices Using Scoring of Application Permissions Ajit Kumara , K S Kuppusamya,∗, G. Aghilab b Department

a Department of Computer Science, Pondicherry University, Pondicherry, India 605014 of Computer Science and Engineering, National Institute of Technology Puducherry, Karaikal, India, 609609

Abstract With the emergence of Android as a leading operating system in mobile devices, it becomes mandatory to develop specialized, predictive and robust security measures to provide a dependable environment for users. Extant reactive and proactive security techniques would not be enough to tackle the fast-growing security challenges in the Android environment. This paper has proposed a predictive forensic approach to detect suspicious Android applications. An in-depth study of statistical properties of permissions used by the malicious and benign Android applications has been performed. Based on the results of this study, a weighted score based feature set has been created which is used to build a predictive and lightweight malware detector for Android devices. Various experiments conducted on the aforementioned feature set, an improved accuracy level of 99% has been achieved with Random Forest classifier. This trained model has been used to build a forensic tool entitled FAMOUS (Forensic Analysis of MObile devices Using Scoring of application permissions) which is able to scan all the installed applications of an attached device and provide a descriptive report. Keywords: Apk Permissions, Static Analysis, Weighted Feature, Machine Learning, Android Malware Triage, Forensic Triage tool

∗ Corresponding

author Email address: [email protected] (K S Kuppusamy )

Preprint submitted to Future Generation Computer Systems

January 4, 2018

Table 1: Trend of Android Malware Characteristics 5

1. Introduction

Characteristics Personal Data stealer Botnet Ransomware Adware Google-Play’s Apps Root-Exploits Premium rated SMS sender Commercial Trojans or Spy-Kits Online-Banking Trojans

Android is one of the most popular operating systems (OS) for smartphone and many other embedded devices. Fig. 1b shows the exponential growth in the usage of Android devices among the users which holds 85% market share in the first quarter of 2017 1 . One of the major reasons for the popularity of Android OS is the ability to extend the default features by installing additional applications (normally called as Apps or Android Apps) from various app markets. In addition to the Google Play Store (official app market), the Android apps are distributed by many other third-party Android markets such as Amazon App store, Samsung Galaxy app, Slide ME, F-Droid etc. which have increased the popularity and reach of Android. With this increased popularity, the threat levels have also increased many-folds. Covert propagation and installation of malicious apps are one of the topmost threats faced by the Android users [1]. Google Android security scan’s results exhibit that out of 400 million scanned devices, 2.6 million devices were infected with Android malware. Among the infected devices 0.6 million devices were those devices in which applications were only installed from Google play store whereas the remaining 2 million infected devices had applications from both Google play store and other third-party sources 2 . Presence of sensitive personal information, Internet connectivity and ubiquitous nature of smartphone makes it a prime target for cyber attacks. The openness of Android (open-source code base and installation of apps from unofficial sources) also draw the attention of attackers and makes it more vulnerable. Table 1 shows a comparative growth status of different Android malware characteristics 3 . It shall be inferred from Table 1 that most of the threat characteristics are rising year by year. The Android platform has been designed with multi-layered security measures such as Linux kernel at OS level, mandatory application sandboxing at application level, secure interprocess communication at process level, application signing at developer level and application-defined and user-granted permissions at end-user level 4 . Fig. 2 illustrates Android multilayered security measures. Among all the aforementioned security mechanisms, the effectiveness of application’s permission layer outcome is fully dependent upon the end-user which would be treated as either empowering the user or burdening, from different perspectives. This 100% user dependent security check is the weakest link in the Android’s security pipeline. To perform different tasks on the device such as accessing the Internet, using the camera, reading contacts etc., each application has to explicitly display and acquire the required permissions during the installation and user has to decide either granting requested permissions and proceed with installation or denying it and canceling the installation. As most of the end-users are not technically aware to make an informed decision, they grant permissions and install the application without understanding the

2014 63.1 27.3 0.6 4.8 9 8.9 8

2015 50 23 15.4 13 11.5 11.5 10

Growth -13.1 -4.3 +14.8 +8.2 +2.5 +2.6 +2

7.5

10

+2.5

10

7.7

-2.3

malicious intentions of application. This weak spot of Android multi-layered security has been picked by attackers and hence many malicious apps are intruding the end-user devices through various third-party malicious app stores. A very extensive work related to Android malware in Unofficial Android marketplaces found out that 5% app were malicious [2]. Not only Android apps are under threat but iOS apps are also being targeted. A recent work demonstrates the utility of the circumvention techniques using 18 popular iOS and presented first iOS cloud app security taxonomy [3]. Malicious Android app detection is one of the active research domain with approaches spanning across pattern matching to machine learning [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]. In the last decade, several classification and clustering tasks, including malware classification have been performed using machine learning methods. With the exponential growth in mobile malware, the machine learning-based detection techniques which were initially developed for desktop malware detection have been ported to mobile platforms [16, 8, 17, 12, 18]. In a recent work ensemble of multiple classifiers (Support Vector Machine, K-Nearest Neighbor (KNN), Naive Bayes (NB), Classification and Regression Tree (CART) and Random Forest (RF)) were used to detecting malicious apps (accuracy 99.39%) and to categorize benign apps (accuracy 82.93%) [19]. Feature sets that have been devised and used to build Android malware classifiers can be grouped into three categories: 1) Static 2) Dynamic and 3) Hybrid. Static features are extracted without executing the application under consideration. Dynamic features are extracted by logging the activities triggered by the application during the execution in a controlled environment. Permissions, intent, meta-data, and API calls are examples of static features whereas network activity, process trace, memory analysis and OS interaction are examples of dynamic features. Hybrid features are built by the combination of both static and dynamic features. There are studies which have presented a taxonomy of mobile malware features and extended aforementioned grouping of features by adding Applications’ meta-data as a new feature group [20]. This paper adopts static features (permissions) to build a machine learning based model for detecting malicious apps.

1 http://www.idc.com/prodserv/smartphone-os-market-share.jsp

2 http://static.googleusercontent.com

3 http://forensics.spreitzenbarth.de/2016/01/03/our-android-malwaresummary-for-the-year-2015/ 4 https://source.android.com/security/

5 http://forensics.spreitzenbarth.de/2016/01/03/our-android-malwaresummary-for-the-year-2015/

2

(a) Android and iOS market share from 1 st quarter of 2016 to 1 st quarter of 2017

(b) Smartphone market share in 1 st quarter of 2017

Figure 1: Smartphone market share

of benign and malware permissions has been devised. A feature set is built based on the score given to each permission. Since machine learning algorithms work on different techniques, they differ in accuracy according to their domains and features. To build an Android malware classifier with high accuracy, various machine learning algorithms such as Random forest (with different configuration), Decision tree, Naive Bayes, and SVM have experimented on the extracted set of features. The best among all trained and tested classifiers have been used as a scanning engine to the forensic tool FAMOUS and its prediction is included in a report which provides forensic expert a clue of the nature of the application. FAMOUS uses Android Debug Bridge (ADB) connection interface to interact with device under observation and pull all installed apk to local machine to perform analysis. The proposed tool is extensible and could be enhanced by adding more scanning engines based on different features and algorithms. The objectives of proposed work are listed below:

Figure 2: Android multi-layered security measures

• To detect maliciousness of Android applications with a non-signature based method.

Many tools are being built to do an effective forensic analysis in recent times. Triaging for Detecting the suspicious application on a mobile device is one of the critical tasks of forensic analysis. DroidAnalyzer is a static analysis tool which identifies potential vulnerabilities of Android apps and the presence of root exploits [21]. Currently, most of the tools use signature-based anti-virus (AV) engines to detect Android malware, which has two following limitations: a) signature-based antivirus engines are limited to the signature database and represent a reactive security measure; b) most of AV-engines are built for desktop applications and are extended for mobile devices so they lack in harnessing mobile-specific feature during the detecting process. In this paper, the aforementioned challenges are considered and a new forensic tool termed FAMOUS (Forensic Analysis of MObile devices Using Scoring of application permissions) incorporating machine-learning based classifier has been developed. The application’s permissions have been used to build a feature set to train various machine learning algorithms. Previous studies [17, 8, 13] have represented permissions as the Boolean feature, in which required permissions were represented as 1 and rest were assigned 0. In this work, a permission scoring system based on statistical analysis

• To engineer a new method for feature set construction using non-binary permission representation. • To build a forensic tool to assist analysts in triaging suspicious Android applications. The remainder of the paper is organized as follows: We have summarized and presented the related works on Android applications classification based on permission and machine learning techniques in Section 2. We have described the proposed FAMOUS model in Section 3. Details about the dataset, experimental setup, statistical test, and training of classifiers are provided in Section 4. Results and Discussions for experiments are presented in Section 5. Section 6 list out the conclusions and future directions for the proposed research work. 2. Related Works Triage is “the process of examining problems in order to decide which ones are the most serious and must be dealt with 3

first” 6 . Using a similar approach, deciding which applications need a human analyst or in-depth analysis, we built a machine learning based triaging tool FAMOUS. In this section, we have summarized and presented works related to triaging and machine learning based Android applications classification. The proposed work is motivated by the works of Marturana and Tacconi which have presented a machine learning based triaging methodology which aims towards automating the categorization of digital media on the basis of plausible connections between traces retrieved (i.e. digital evidence) and crimes under investigation [22]. Authors have defined a set of crime-related features and after extracting, processing and storing as the matrix, they used various machine learning algorithms to devise a device classification system. They have also demonstrated the usability and effectiveness of their method by experimenting with two use cases copyright infringement and child pornography exchange. With aforementioned methodology, they were able to achieve 93% and 100% accuracy with copyright infringement and child pornography exchange respectively [22]. AForensic [23] is a forensic tool to extract permissions and other meta-data information of third-party applications installed on a device and provides an XML(EXtensible Markup Language) output to the forensic analyst. Further, they suggest a comparison of permissions to triage any suspicious application considering their permission-based profile built using Apriori algorithm. AForensic has its limitations, such as for large sample scanning it will take more time and also involves manual intervention to make a decision. In another work, authors have used machine learning to classify object code to its target architecture and endianness [24]. Byte value histograms were used to identify target architecture while heuristics based features within the operands were used to determine endianness of given object code. This work has experimented with 16000 samples of 20 architectures and used Neural Network, Decision tree, Random forest, Naive Bayes, BayesNet, SVM(SMO), Logistic Regression and Neural Net algorithms. Many earlier works had presented the study on the classification of Android applications into different classes based on various static, dynamic or hybrid features. There are works related to the features of FAMOUS where it uses Android’s permissionbased feature extracted by static methods. Stowaway [25] is a tool to map API calls to requested permissions by an application which help in detecting over-privileged applications. This study found that one-third of experimented 940 applications were over-privileged and pointed out that insufficient API documentation is the main reason for over-privilege. In a similar hypothesis, a work considered that the request for more permissions than required by an Android application is malicious [26]. To control the maliciousness by over-privileged application, the authors proposed a method that detects such Android application by use of permissions and API calls [26]. Work of Fang et al. helps to understand the Android’s permission-based security issues and it countermeasures [27]. Most of earlier works had focused only on required permissions declared by the developers whereas the used permissions got very fewer attention [28].

Apart from malware families, malware and benign classification of Android applications, some works have used static, dynamic and hybrid features to group applications into various categories. The Features extracted from Java class files and XML files were used to build classifier to label an apk into tool or game category [18]. This study attempts to find out best classifier and best features to achieve the highest accuracy. Their findings show that the combination of Boosted Bayesian Networks and the top 800 features selected using Information Gain yield an accuracy level of 0.918 with an FPR of 0.172. By using permissions (extracted from the application and from the app market), printable string and metadata (such as rating and file size) a feature set was created and various machine learning based classifiers were trained and tested for automatic categorization of Android app [29]. A recent study focused on finding discriminatory and persistent features of Android APK files grouped features into app-specific features and platformdefined features [30]. For a different classification work, only extracted permissions were used as features and 2-layer Neural Networks (NN) were used to build a classifier which can group a given application into 34 different categories [13]. Among different algorithms, Bayesian regularization has given the highest accuracy of about 60.0%. Another work has used permissions with other manifest XML based attributes and application meta information such as the feature to cluster APK into two categories i.e. tool and business [31]. The aim of work was to analyze clustering method and to utilize it in a building a machine learning based Android malware classifier. The same approach could be used to differentiate between malicious and benign applications as in the similar works [18, 29, 13] which focused on the classification of Android application for different purposes. A recent work developed feature detectors to match and extract various static features such as API calls, Linux system commands, and manifest’s permissions [32]. Authors used Boolean mapping function (1 if a feature is present 0 otherwise) to create a dataset and used Bayesian classification. With 1600 samples, the aforementioned method was able to achieve an accuracy of about 92% and AUC of 97%. Similarly, another work used permission with API calls and the combination of these two as the feature set and trained SVM, J48 and Bagging machine learning based method then built different classifiers and achieved best results with permission and API call combined feature set [33]. MAMA [17] had presented an analysis of Manifest file for Android malware detection and created three sets of features by extracting values from different elements of the manifest file. These three feature sets were used to train four machine learning algorithms (KNN, Decision Tree, Bayesian tree, and SVM) with different algorithmic configuration. On permissions and features combined feature set with Random forest (100 trees) authors achieved an accuracy of 94.83%, which is about 10% better than two other feature set which are permissions and feature-only [17]. PUMA [8], is an Android malware detector which uses permission and usesfeature of an application as feature set to train machine learning algorithms. Authors have achieved an accuracy of 87% and 0.19 FPR with Random Forest (100 trees). An Android mal-

6 http://dictionary.cambridge.org/dictionary/english/triage

4

Table 2: Summary of Related works Types A B

C

D E

Works Marturana et al.(2013) [22] Aforensic(2010) [23] Clemens et al.(2015) [24] Geneiatakis et al.(2015) [25] Shabtai et al.(2010) [18] Sanz et al.(2012) [26] PUMA(2013) [8] MAMA(2013) [17] Milosevic et al.(2017) [27] PIndroid(2017) [28] Yerima et al.(2013) [29] Peiravian et al.(2013) [30] Ghorban et al.(2013) [13] Samra et al.(2013) [31] Apk Evaluator(2015) [36] MAETROID(2016) [38]

Tasks/Objectives Triage of Digital Evidence Tool for Triage Malicious App Object code classification Over-privileged app classification To group app into tool and game Automatic app categorization AMD AMD AMD AMD AMD AMD Grouping app into 34 Categories Cluster apps into tool and business AMD App’s risk level

Methods ML PM ML ML ML ML ML ML Ensemble Ensemble Bayesian ML NN Clustering Signature Scoring

Features Crime-related features Perm and MD Byte value histogram Perm and API call Features from Java class files Perm,Printable strings and MD Perm Perm and Features Perm Perm and Intents Perm, API calls and Linux command Perm and API calls Perm Perm and Manifest’s XML attributes Perm Perm and MD

Note: ML: Machine Learning, PM: Profile Matching, Perm: Permissions, MD: Metadata, AMD: Android Malware Detection, NN: Neural Network. A: Triage, B: ML-based Classification, C: Permission based Android Malware Classification, D: Permission based Feature set for App classification, E: Tools or Application.

based representation for permissions. These weighted permission representations are used to build a feature set which forms the basis of the machine learning model building process. The objective of the proposed FAMOUS model is to build a mobile app triaging tool to assist a human expert in further analysis.

ware classification framework to manage big app market was proposed, which used 11 different types of static features and ensemble of various learners [19]. Authors achieved the best accuracy of 99.39% for malapps detection whereas 82.93% accuracy for benign apps detection. In a recent work, permissions and source code analysis based feature set were used to classify apps into malicious and benign using machine learning [34]. With permission-based feature set and ensemble learning best result i.e. F-score 89.4% were achieved. PIndroid [35], used permissions and intents to train ensemble learner and achieved 99.8% accuracy.

3. FAMOUS:Material and Methods FAMOUS is a forensic analysis tool built to triage Android applications and to assist the analyst in the selection of applications for further in-depth or manual analysis. Screenshots of the main window and result window of FAMOUS are as illustrated in Fig. 3 and Fig. 4. The motivation behind FAMOUS is to overcome the limitations of the signature-based triaging forensic tool. The main functions of FAMOUS is to assign a proper class label (among benign and malware / suspicious) to every selected Android application by underlying classification engine. Each classification engine is built by training and testing different machine learning algorithms on proposed permission’s score based feature set that is extracted from a large dataset. Currently, in the proof-of-concept implementation, it has only best-performing classifier but it can be easily extended with more classifiers. Further subsections explain in detail about FAMOUS architecture and its components.

Apk Evaluator [36] is a permission-based classification system for the Android application. In training phase, it uses static analysis techniques to build a permissions-based signature database which is used to characterize profile for Android applications in the evaluation phase. Apk Evaluator [36] is trained, analyzed and tested with 1853 benign samples collected from Google’s Play Store and 6909 malicious applications which were collected from different sources [16, 1, 37]. With experimental results, authors have claimed approximately 88% accuracy with a 0.925 specificity. MAETROID [38] uses the set of requested permissions and a set of metadata retrieved from the marketplace to analyze an app and provides the apps risk level. A summarized report on permission-based Android Malware detection techniques has presented in [39]. In the same work, the author has discussed various limitations of previous works and pointed out many issues with earlier techniques such as many of these works focused only on the most requested permission instead of considering every permission potentially risky, availability of malware samples, failure of machine learning techniques against mimicry and poisoning attacks, unable to work against obfuscation strategies and computation overhead and inefficiency of the solution due to more number of features extraction from the manifest.

3.1. FAMOUS’ Architecture FAMOUS architecture can be seen as two main modules: Data acquisition and Classification. The main task of data acquisition module is to extract apk files from the attached Android device. It uses Android Debug Bridge (ADB) protocol to connect Android device to analyst’s system. It pulls and lists out all installed applications with size and type of application (system and third party). FAMOUS does not require a rooted device because “apk” can be pulled out with ADB protocol without root access on the device. Once all the “apk” files are pulled out from the attached device, access of pulled “apk” storage is passed to classification

One of the major limitations observed from the survey of existing works in this domain is the boolean representation of permission. The proposed FAMOUS model incorporates a score 5

named as “AndroidManifest.xml” as per the Android developer guidelines 7 . Among other essential information, it declares which permissions the application must have in order to access protected parts of the API and interact with other applications. These permissions are declared as uses-permission and usesfeature element in manifest file. Fig. 7 shows a list of permissions requested by a sample application. Permission extractor module extracts all permissions declared in the manifest file and will write them to external storage. In FAMOUS, CSV(Comma Separated Values) file format is used to store all extracted permissions and pass these to further module for processing. Permission extractor does not differentiate between Android’s standard permissions and other userdefined custom permissions, so it extracts all permissions requested by an Android application. Android standard permissions are declared as android.permission.PERMISSION-NAME while other custom permissions can have any structure but mostly follow the package name format such as com.android, com.motorola and org., hr.. 3.2.2. Scoring Engine The scoring engine takes extracted permissions file as input, and in first iteration, it extracts only standard Android’s permissions. After cleaning the permission format, it counts the occurrences of each permission in benign and malware category. Scoring engine calculates values for different variables (B, M, PuB and PuM) that is used to find the BS P, MS P and ES MP. B and M represent the total sample used from benign and malware class respectively. Permission used in Benign (PuB) and Permission used in Malware (PuM) variables are used to represent each permission count in each group. After calculating aforementioned preliminary variables, Benign Score of Permission (BS P) and Malicious Score of Permission (MS P) for each permission is calculated by using Eq.1 and Eq.2 respectively. In Eq.1, PuB is the total number of times a permission is used in the benign group, and B is the total number of benign samples.

Figure 3: Main Window of FAMOUS: Listing all applications of attached device

module that does further pre-processing and labels each apk either with benign or malware. The core of classification module is the Feature Extraction and Scoring (FES) component which create a feature set from given data samples. The output of FES is used to train machine learning algorithms and build a classifier. Many pre-processing tasks such as (permissions extraction, scoring, and feature set generation) are carried out on pulled apk to generate the feature set. Fig. 5 shows a block diagram of FAMOUS’ architecture. 3.2. Feature Extraction and Scoring (FES) Feature Extraction and Scoring (FES) has three main components: (1) Permission Extractor, (2) Scoring Engine and (3) Feature Set Generator. Permission Extractor extracts all requested permissions present in an apk, scoring engine works in back-end and updates the maliciousness score of each permission and feature set generator uses scoring engine’s output and gives the score to each permission present in a given android application. Fig. 6 depicts all three components, their interaction, and control flow among themselves. FES component takes malware and benign Android samples as input and produces a feature set in a multi-dimensional vector representation. Each column represents a feature except last that store class label and each row represents a vector representation of permissions present in the sample. In further subsections 3.2.1, 3.2.2, 3.2.3 each of aforementioned components are explained in detail.

BS P = PuB/B

(1)

In Eq.2, PuM is the total number of times a permission is used in all malware sample, and M is total number of malware sample. MS P = PuM/M (2) Effective Maliciousness Score of Permission (EMS P) is the subtraction of BS P value from MS P that will normalize the value of permissions which are used in benign and so the resulted value will represent the maliciousness weight of the permission. The EMS P can be calculated by using Eq. 3, where BS P and MS P values are calculated using Eq. 1 and Eq. 2. EMS P = MS P − BS P

(3)

We have calculated MS P Eq. 2, BS P Eq. 1 and EMS P Eq.3 for all the permissions (includes uses-permissions and uses-features) according to explained Score Engine on prepared

3.2.1. Permission Extractor Android application packed as an apk file has all required files to run the application. Among other files, every apk must have a manifest file, which is a binary XML file and must be

7 https://developer.android.com/guide/topics/manifest/manifest-intro.html

6

Figure 4: Scan Result Window of FAMOUS: Showing class label of all selected applications

Figure 5: Block diagram of FAMOUS’ Architecture (ML-1 to ML-n are different machine Learning algorithms and C1 to Cn are classifiers which are output of ML training

Figure 6: Feature Extraction and Scoring System

7

Figure 7: A snapshot of AndroidManifest file showing uses-permission element

dataset (explained in Section 4.1). Table 3 shows top 25 permissions from each group and their MS P, BS P and EMS P score. To understand the permission’s scoring process, an example is presented and explained. Suppose, sample present in example (Ref. Fig. 7) is a malware, so MS P for INTERNET permission can be calculated as (5979/5553 = 1.07671) by using Eq.2, where PuM for INTERNET is 5979 (Ref. Table 3) and value of M i.e. total malware sample is 5553. Similarly, if sample (Ref. Fig. 7) is benign, then BS P for for INTERNET permission can be calculated as (5536/5818 = 0.9515) by using Eq.1, where PuB for INTERNET is 5536 (Ref. Table 3) and value of B i.e. total benign sample is 5818. Once MS P and BS P is calculated for INTERNET permission, the EMS P can be calculated as (1.07671 − 0.9515 = 0.1251) by using Eq. 3. Maliciousness Score (MS ) of an Android’s application is the sum of the EMS P value of each permission present in the given application. The MS of an app will be calculated by using Eq. 4, where EMS P is a pre-calculated value for each standard Android’s permission based on the dataset, n is total permission present in the given Android application and pi is individual EMS P of each present permission. MS =

n X

EMS P(pi )

3.2.3. Feature Set Generator Feature Set Generator module is dependent on previous two modules i.e. Permission Extractor and Scoring Engine. It takes apk files as input and call permission extractor to extract permissions and uses scoring engine output to assign the value to each permission. Fig. 8 shows a row from the feature set based on permissions presents in sample apk file (Ref. Fig. 7). We can see that present permissions have got value same as their respective EMSP value, whereas all other permissions (features) get assigned zero as value. It is in contrast with earlier permission based feature set that was considered as the boolean feature. In such feature set, present permissions of a manifest get 1 and all others were assigned 0 as values. Eq. 5 is used to perform the mapping of the score to the feature. The process of dataset to feature set generation is illustrated in Algorithm 1 and it consolidate all actions performed at three aforementioned steps i.e. permission extraction, scoring and feature set generation. In Algorithm 1, Ψ and Ω is collection of benign and malware sample respectively and it is supplied as input to the GenerateFeatureSet(). χ is feature set’s data and λ is set for respective label of each instance in χ, χ and λ is output of GenerateFeatureSet().

(4)

i=1

In Algorithm 1, lines 2-5 do the initialization for the global variable ρB , ρ M , α and β. The ρB , ρ M is two-dimensional array which will store permission (as string) and their respective count for benign and malware samples respectively. The α and β will holds the total count of benign and malware samples respectively. Permissions extraction and counting from benign and malware sample is done through lines 6-8 and lines 9-11 respectively. The BS P, MS P and EMS P value is calculated for all the permissions by looping through the permission array and doing a lookup for respective values the lines 12-15 represent the Pseudo-code for the same. Finally, feature set χ is initialized and populated with features values and respective class label shown through lines 16-21.

Continuing the earlier example for sample apk (Ref. Fig. 7), the total maliciousness score can be calculated as (0.1251 + 0.6048 + (−0.0222) + 0.4951 + 0.1694 + 0.1646) by using Eq.4, which sums EMS P value of all 6 permissions (refer Table 3 for all other permission’s EMS P values). The EMS P score can be used in two ways, first to calculate total maliciousness score of a sample and second to create a feature set to train machine learning algorithm (as explained in section 3.2.3). MS can be used to build a lightweight classifier, for which it has to be compared with a threshold value to decide class label of a given test apk. Threshold finding can be achieved by different ways, one of the methods can be similar to [36]’s work which used Logistic regression to find a threshold. Proposed work is not using MS scoring to build the classifier, hence we have not performed any further investigation in this direction. The calculated EMS P is passed to next component Feature Set Generator that create a vector representation for all samples based on permissions present in each sample and their respective EMS P score. 8

   S core(p), (if the application x have permission p) I(x, p) =   0, Otherwise (5)

Table 3: PuB, PuM, BSP, MSP and EMSP values of Top 25 permissions (sorted based on Malware)

permission INTERNET READ PHONE STATE ACCESS NETWORK STATE WRITE EXTERNAL STORAGE SEND SMS RECEIVE BOOT COMPLETED ACCESS WIFI STATE WAKE LOCK RECEIVE SMS READ SMS ACCESS COARSE LOCATION ACCESS FINE LOCATION VIBRATE READ CONTACTS WRITE SMS CHANGE WIFI STATE INSTALL PACKAGES GET TASKS RESTART PACKAGES CALL PHONE WRITE SETTINGS ACCESS LOCATION EXTRA COMMANDS WRITE APN SETTINGS WRITE CONTACTS SET WALLPAPER

PuM 5979 5463 4440 4146 3058 2755 2563 2193 2151 2098 1920 1780 1659 1334 1246 1001 829 821 756 742 686

PuB 5536 2205 4781 3386 274 6 1700 1780 169 91 1591 1684 1 443 44 201 19 417 82 664 208

BSP 0.9515 0.379 0.8218 0.582 0.0471 0.001 0.2922 0.3059 0.029 0.0156 0.2735 0.2894 0.0002 0.0761 0.0076 0.0345 0.0033 0.0717 0.0141 0.1141 0.0358

MSP 1.0767 0.9838 0.7996 0.7466 0.5507 0.4961 0.4616 0.3949 0.3874 0.3778 0.3458 0.3205 0.2988 0.2402 0.2244 0.1803 0.1493 0.1478 0.1361 0.1336 0.1235

EMSP 0.1252 0.6048 -0.0222 0.1646 0.5036 0.4951 0.1694 0.089 0.3584 0.3622 0.0723 0.0311 0.2986 0.1641 0.2168 0.1458 0.146 0.0761 0.122 0.0195 0.0877

618

270

0.0464

0.1113

0.0649

564 546 530

8 237 240

0.0014 0.0407 0.0413

0.1016 0.0983 0.0954

0.1002 0.0576 0.0541

Figure 8: A snapshot of feature set generated based on EMSP and permissions from figure 7

4. Experiments

Android applications for the experiment, where dataset have samples from both classes i.e. 5553 malware and 5818 benign. For this dataset, malware samples were adopted from DREBIN project [16] and benign samples were downloaded from PlayDrone archive [40]. Second dataset (dataset-2), has total 4317 samples, Malware samples were gathered from multiple online public archives such as Contagio dump [37], AndroMalShare [41] and Andrototal [42]. For benign samples we used PlayDrone collection [40] which has a sorted list of Google Play apks based on download count. We downloaded top 999 and bottom 979 application from sorted list by pipelining Linux’s head and tail output to grep and wget respectively. Along with this we also collected 755 samples from different third-party app stores and a torrent collection 8 of 1380 Google Play’s paid apps and games. Table 4 illustrates the source and various categories of benign samples. All third party applications were verified with VirusTotal [43] and we found out that 46 (7%) applications are detected as malware by minimum one

To test the effectiveness of EMS P based feature set and usability of FAMOUS two main experiments were carried out. In the first experiment (Experiment-I) the accuracy of different machine learning algorithms with different configuration was tested. In the second experiment (Experiment-II), FAMOUS was tested with real user’s device. In result and discussion (Section 5) output of both experiments are presented and discussed. In further sub-sections, we explain about the dataset used for experiments. In Sec. 4.1, the Experimental set-up that was used for all experiments is explained in Sec. 4.2, the output of a pilot statistical test that was performed on the dataset is presented in Sec. 4.3 and in last under classifier training subsection, we have explained about machine learning algorithms and different metrics used for evaluation of their performance (Sec. 4.4). 4.1. Dataset For the purpose of experiment-I, we amalgamated two datasets. In first dataset (dataset-1), we have a total of 11, 371

8 https://kat.cr/1380-paid-android-apps-and-games-apk-t5344319.html

9

Algorithm 1 Feature Set Generation 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22:

Table 4: Android apks collected from Third Party App Stores

procedure GenerateFeatureSet(Ψ, Ω) ρB [perm, count] ← permS tring, 0 ρ M [perm, count] ← permS tring, 0 α ← count(Ψ) β ← count(Ω) for κ ∈ Ψ do P ← FetchPerm(κ) PuB ← U pdatePermCount(ρB [P, count]) for κ ∈ Ω do P ← FetchPerm(κ) PuM ← U pdatePermCount(ρ M [P, count]) for η ∈ ρ do BS P[η[permS tring]] ← η[PuB]/α MS P[η[permS tring]] ← η[PuM]/β EMS P[η[permS tring]] ← MS P − BS P Initilaize(χ[α + β][count[ρB+M [perm]]], 0) for µ ∈ {Ψ, Ω} do γ ← FetchPerm(µ) λ ← updateLabel(γ) if permS tring ∈ γ then χ[permS tring] ← Fetch(EMS P[permS tring]) return(χ, λ)

Category

9apps

Bussiness Education Entertainment Games Lifestyle Multimedia Personalisation

63 73 50 79 64 20 82 Total

†

fdroid & others 2 98 0 78 16 75 55 755 + 1380†= 2135

Total 65 171 50 157 80 95 137

Torrent collection Google Play’s paid Apps and Games

4.2. Experimental Setup An experimental system was prepared to carry out various kinds of experiments on the proposed model. Experimental system had Ubuntu 14.4, 64-bit OS running on Intel Core 2 Duo CPU [email protected] processor with 4GB primary memory and 500GB secondary memory. Python programming language with various modules was used for all experiments. APK processing was done using Androguard [44], which is a Pythonbased tool to perform different kinds of processing on an Android application. After collecting raw dataset (apk files), class of each apk was verified by using VirusTotal [43] web service, which scanned each sample by nearly 55 parallel anti-virus engines. The decision on the class label was made separately for malware and benign based on the outcome of total positive results given by VirusTotal [43]. For malware, we adopted if any criteria i.e, if any of the scanning engine flag positive (detect it as malware) for a given sample we consider that sample as malware while for benign if all criteria was adopted i.e. if and only if, all engines pass a given sample as clean then a benign label is attached to the sample. Eq. 6 represents the class labeling (CL) process adopted based on positive score of VirusTotal’s scanning engine Ei . Each class sample was stored separately and was given as input to permission extractor component (Section 3.2.1). MD5 hashes and SHA hashes along with package names were used to identify samples uniquely. Scikit-learn [45], a Python library was used for the purpose of machine learning algorithms training and testing. All the statistical calculation and graph plotting was also done using various Python-based modules. To implement FAMOUS as the forensic tool, along with other modules wxPython module was used for developing Graphical User Interface (GUI).

antivirus engine. We have used the MD5 hash to get the scan report and hence 130 applications do not have scanned result which was eliminated along with malicious applications from our dataset and we ended up with a total of 587 third-party applications in the benign dataset. Likewise, all the samples in both datasets were scanned with VirusTotal [43] and the proper class label was given accordingly. Duplicate samples in dataset would skew the accuracy of the experiment, hence we removed duplicate samples by using MD5 hash and package name and only keep the unique samples in each dataset and each category. Dataset-1 was used for scoring, feature set generation and training whereas dataset-2 was only used for testing. It was made sure that samples present in dataset-2 are not available in dataset-1 and hence it is not used for score calculation. This separation represents the real-world scenario where new apps appear with new permission patterns. For Experiment-II, the dataset consisted of end-users’ devices, for which we randomly selected 4 users with the smartphone having different hardware with Android OS (different OS versions). Users were guided to activate the developer options 9 on their phone and users’ permission were obtained to activate USB mode debugging option on each phone.

    M, I f (E1 (S ) ∨ E2 (S ) ∨ . . . En (S )) CL(M|B, S ) =    B, elseI f (!(E1 (S ))∧!(E2 (S )) ∧ . . .!(En (S ))) (6)

4.3. Statistical test

Statistical tests are very good indicators to understand the relation between various variables. Before conducting classifi-

9 https://developer.android.com/studio/run/device.html

10

Figure 9: Histogram of file size for malware and benign

Figure 10: Histogram of file count for malware and benign

cation experiments, we wanted to study the similarities and differences between malware and benign samples. To understand this, we selected filesize, total files in apk and total permissions as candidate variables and plotted an overlay histogram of each variable, in both malware and benign samples. Fig. 9 shows that malware and benign samples have different filesize patterns, malware seems to have smaller in size whereas benign samples are larger in size. The mean of sample size for the benign category was observed as 6.37 with a standard deviation(SD) of 8.52 whereas the mean of file-size for malware was observed as 1.31 with SD 2.02. With these mean and SD values, it can be concluded that benign apk has an average size of 7MB whereas malware is much smaller in size and has the average size of 2MB. Larger variations in benign also suggest that benign sample file sizes span across a larger spectrum (i.e. 0 − 20MB in Fig. 9) whereas malware file size does not vary much and 8MB size seems to be the largest size. Android’s apk is a compressed (very similar to zip compression) form of many files present in an Android application which includes all resource files under res directory, dex files, AndroidManifest and many other required files. We consider all files present in an apk as a candidate variable and counted it for all samples available for the experiment. Each apk file was decompressed and all present files were counted after which file count value is plotted as an overlay histogram as shown in Fig. 10. An overview of histogram reveals that malware uses very less number of files whereas benign samples have the number of files. The mean of number of files for the benign category was observed as 342 whereas the mean of number of files for malware category was observed only as 90. The number of files present in benign and malware category correlated with the size of file i.e. malware has the fewer number of files resulted in small size whereas benign have number of files resulted in large size. Android uses a permission system to provide security to the user where every application requires declaring respective permissions to access a service. During installation all permissions

Figure 11: Histogram of total permissions requested by malware and benign

required by an application is shown to the user and installation will proceed only if the user accepts and allow else installation will be aborted. Studying permissions pattern for malware and benign will be helpful to find methods and threshold to build classification system. From every apk all permissions were extracted and count of permissions for malware and benign category were used to create an overlay histogram. Fig. 11 shows the comparison of permission count for malware and benign category. The mean of permissions count for the benign category was observed as 6.73 with an SD of 5.66 whereas mean of permission count for malware category was observed as 12.43 with an SD of 7.96. It can be inferred that malware permission count is double than that of the benign category. The graph vividly shows that permission count for benign is much higher in range of 0 − 15 and very low towards higher bin sizes. Malware samples show its presence at lower bin sizes but have a significant high presence at higher bin sizes, where benign presence is negligible. After studying aforementioned variables for statistical prop11

4.4. Classifier Training On the basis of the type of learning, machine learning algorithms are classified into three main categories: Supervised, Unsupervised and Semi-supervised. A Supervised learning algorithm requires class label along with feature set while unsupervised learning algorithm use only features set and learn the class by itself. Semi-supervised is a mixture of supervised and unsupervised learning, it starts learning with a small sample of the labeled dataset and then uses learned the model to label rest of unlabelled dataset and use them to update. Supervised learning is mainly suitable for classification task while unsupervised learning suits the clustering task. Labelling Android application as benign or malware is a classification task and hence we have adopted a supervised learning approach (Moreover, we have labeled dataset). Selection of various supervised learning algorithms is based on methods used in earlier works and to achieve a full coverage of different groups of algorithms such as tree-based, probability-based and ensemble-based algorithms. We have selected Decision Tree (DT), Naive Bayes (NB), Knearest (kNN), Support Vector Machine(SVM), Random Forest (RF) and Boosting to train on the proposed dataset. Each of this algorithm was trained on the dataset and their performance was evaluated on following metrics: Accuracy, True Positive Rate (TPR), False Positive Rate (FPR) and their Receiver Operating Characteristic (ROC) curve and Area Under Curve (AUC) were compared. Further experimental set-up gives details about implementation and tuning of parameters for aforementioned algorithms.

Figure 12: Histogram of total EMSP for malware and benign apks

5. Results and Discussions In this section results of Experiment-I and Experiment-II are presented and observed findings are listed out. 5.1. Experiment-I: Machine Learning classifier performance test

Figure 13: Histogram of total BSP and MSP for malware and benign

In experiment-I, we have compared the performance of all six selected machine learning algorithms by splitting dataset-I into training and testing set with a ratio of 70% and 30% respectively. Fig. 14 shows the ROC and AUC values of all selected classifiers. It shall be observed that Random Forest (RF) performance is best among all other classifiers. This result of RF is achieved by using 100 estimators during training. To find the optimum value for the number of estimators, Random Forest was again trained with the same training and testing dataset by adjusting estimator’s value (10, 50, 100, 150, 200). Fig. 15 shows the ROC and AUC value of RF with different numbers of estimators. It can be observed that with 100 estimators RF achieves its maximum AUC value i.e. 99.0% and there is no significant change in AUC value above 100 estimators. So, in FAMOUS we have used aforementioned configuration for RF based classifier. We have compared the performance of proposed feature set with boolean permission based feature set. Fig. 16 shows the ROC and AUC values of all classifiers on boolean features. Table 5 shows the comparison of accuracy, precision, recall and

erty, experiments were conducted to understand the pattern of EMSP and BSP & MSP based scoring. To study the effect of these scores for malware and benign category histogram were taken as the visualization tool. Individual permission’s score for each sample was calculated as explained in section 3.2.2, total EMSP and BSP & MSP were calculated according to Eq. 4. Total score for both (EMSP and BSP & MSP) were plotted for malware and benign category. Fig. 12 shows histogram based on EMSP based scoring and Fig. 13 shows histogram based on BSP and MSP based scoring. In Fig. 12, we can see that malware has high clustering at higher value bins whereas benign samples values are clustered towards lower value bins. Benign samples have a mean value 0.70 whereas malware has to mean value 1.99 and bin size 2 can be seen as the threshold which clearly separates benign and malware values. 12

Figure 16: ROC of six classifiers on Boolean features

F1-score for EMSP and boolean based feature set for all six selected algorithms. From Table 5 it is evident that EMSP based feature set is performing better than boolean based features set. It is also observed that kNN and SVM are giving better result on the boolean feature set. After experimenting with train and test split of dataset-1 on EMSP and boolean feature, experiments were carried out to test the performance of aforementioned machine learning algorithms with dataset-2 (test dataset). As explained in Sec. 4.1, dataset-1 is used to calculate the EMSP and so even after the train-test split of dataset-1, there are chances of over-fitting due to the presence of samples on which score is calculated. So, to make classifiers robust and accurate on unseen data, testing algorithms with a separate dataset is recommended in the literature. Dataset-2 have samples that are not in included in dataset1. The dataset-1 was used for training and dataset-2 as the test dataset. Again, we have tested with both EMSP and boolean feature based dataset. Fig. 17 illustrates the ROC and AUC for classifiers with test dataset (dataset-2) on EMSP feature set while Fig. 18 show the ROC and AUC for classifiers with test dataset (dataset-2) on boolean feature set. Table 6 list out the performance of algorithms on different metrics. Values of all selected metrics are mentioned for both EMSP and boolean feature set. It is evident from the Table 5 and Table 6 that the performance reduced for both the proposed and the boolean feature set with test dataset. The major observation is that the difference in performance still holds the same relation as to the train-test split test i.e. the proposed feature set is performing better than the boolean feature set. The loss in performance is attributed to the “unknown sample” i.e. sample which is new in the testing dataset and absent from training dataset. The performance of the proposed classifier is also compared with the earlier similar works. The comparison is focused on the accuracy and FPR metric. The best results of all the earlier works with permissions based feature set were compared with the best result of the proposed work. The Table 7 shows

Figure 14: ROC for six different classifier on EMSP based feature

Figure 15: ROC for five different value for number of trees in Random Forest

13

Table 5: Classifiers performance on dataset split (70%-30%)with EMSP and Boolean feature

Classifiers RF DT NB kNN SVM AdaBoost

Accuracy EMSP Boolean 94.84 93.70 93.17 92.20 77.22 75.26 92.44 93.23 86.48 91.44 91.32 90.27

Precision EMSP Boolean 0.95 0.93 0.93 0.92 0.81 0.80 0.92 0.93 0.87 0.92 0.91 0.90

Recall EMSP Boolean 0.95 0.93 0.93 0.92 0.77 0.75 0.92 0.93 0.86 0.91 0.91 0.90

F1-Score EMSP Boolean 0.95 0.93 0.93 0.92 0.76 0.74 0.92 0.93 0.86 0.91 0.91 0.90

Table 6: Classifiers performance on Test Dataset with EMSP and Boolean feature

Classifier RF DT NB kNN SVM AdaBoost

accuracy EMSP Boolean 91.52 91.31 88.95 88.21 89.34 89.83 87.4 91.15 84.73 89.16 86.5 89.53

precision EMSP Boolean 0.94 0.94 0.94 0.94 0.93 0.91 0.94 0.94 0.94 0.94 0.94 0.94

Figure 17: ROC curve on Test dataset with EMSP features

recall EMSP Boolean 0.92 0.91 0.89 0.88 0.89 0.9 0.87 0.91 0.85 0.89 0.86 0.9

f1-score EMSP Boolean 0.93 0.92 0.91 0.9 0.91 0.9 0.9 0.92 0.88 0.91 0.89 0.91

Figure 18: ROC curve on Test dataset with Boolean features

14

the comparison and the last column mentioned the name of machine learning algorithms which achieved the best performance. It can be observed from the Table 7 that proposed feature set outperformed all other only permission-based feature sets. The study of the computational performance of classifiers is crucial for making any applications such as the proposed FAMOUS tool. Various factors influence the computational performance such as number of features, model complexity, feature extraction and input data representation and sparsity. This work differs in input data representation so the computational performance is compared in term of training and testing time of all classifiers in seconds for the boolean and the proposed EMSP feature set. For computational performance comparison, we used Android Malware Classification dataset offered under CDMC 2016 competition. From the given samples, We randomly selected 6448 malware and 6118 benign sample for measuring the computational performance. The CDMC 2016 dataset provides extracted permissions from the APK, so we do not have to perform permission extraction. After selecting the sample and respective permissions, we preprocessed and created the boolean and EMSP feature set as explained in section 3.2.2 and 3.2.3. We trained and tested all the classifiers on both the feature set and logged the training and testing time as shown in Table 8. The training time for EMSP based feature set is slightly better for few classifiers but boolean feature set have better testing time for all the classifiers.

devices. The applications with the suspicious label are those which are flagged malicious by the underlying classifier and so these can be triaged for further analysis. Output window has the option to select listed applications with their class label and after selection, those can be moved to a separate folder on analyst’s system which will help to go for further analysis. 5.2.2. FAMOUS: Operational result FAMOUS is built with the objective to assist forensic analyst by triaging the applications into a category which enables them to make further decision. To achieve the objective, along with accuracy it is also important to be quick in extraction and scanning of applications. The accuracy of FAMOUS mainly depends upon the underlying classifiers that are explained in Section 5.1. In this section, we have explained the time taken by FAMOUS on different experimental devices. Table 9 list out all the devices with their make and model, applications pulling time and scanning time taken by FAMOUS. Pulling and scanning time is average of five runs which is done to overcome any bias. It can be observed from Table 9 that FAMOUS is fast in pulling and scanning the applications from attached devices. It took an average of 14 minutes to pulled all applications from attached device and 4 seconds to scan them. Time given in Table 9 is calculated by averaging all the applications size and total time is taken, so it can vary with devices due to installing applications’ size. Table 9 also shows the number of suspicious apks which were identified as suspicious applications by FAMOUS. These apks have a high probability of being quarantined as malware with further triaging with a human expert.

5.2. Experiment-II:FAMOUS performance test In Experiment-II, the performance of FAMOUS is tested with live Android devices. We acquired Android-based smartphone from random users and performed the scanning of their devices with FAMOUS. In this section, various aspects of experiment and result are presented and discussed. Before scanning the devices, we requested users of the device to activate the developer options on their smartphone. We used USB enabled debugging mode with ADB to connect end user’s devices to our test system. Once the connection was up and running, FAMOUS was executed with various configurations.

6. Conclusions and Future Directions Permission system is one of the security mechanisms adopted in Android OS. We aim to build a triaging forensic tool by using Android application’s permissions. Proposed forensic tool FAMOUS utilized machine learning-based classifier. The underlying classifiers are trained with the proposed feature set i.e. EMSP. EMSP is calculated by using statistical properties of permissions pattern present in malware and benign samples. EMSP based feature set gives better accuracy than boolean permission based feature. The difference is nearly 2% between EMSP and boolean based feature sets. The main reason for performance improvement can be attributed to the fact that feature’s value as floating point encodes more information than the boolean value. In another word, it can be stated that floating point based feature space is dense and compact than the boolean value based feature space. By using ML-based classifiers, the FAMOUS is able to handle unknown and zero-day Android malware. FAMOUS’s predictive technique will help forensic experts to detect new malware before creating any signature and will speed up the analysis tasks. The major conclusions derived from the proposed model are as listed below:

5.2.1. FAMOUS: GUI Interface We implemented the aforementioned proposed approach as a forensic tool named as FAMOUS. Fig. 3 is showing the initial screen of FAMOUS, it lists out all the installed applications of the attached device by showing its size and type of application. Type of application is decided on the basis of installed location i.e. if an application is installed on system partition, then it is considered as system application else in all other it is considered as 3rd party application. The initial screen of FAMOUS has options to select individual applications or according to its type. Once analyst selects the application/s than with scan option all the selected applications will be scanned with underlying classifiers. The scan result of all selected applications will be displayed along with other metadata such as package, version, total permissions, and EMSP score. Fig. 4 shows the output of applications selected from one of the attached experimental

• By understanding and observing statistical properties of permissions pattern in malware and begin samples, a score based feature set was created which improves the classification accuracy by 2% in comparison with existing boolean features. 15

Table 7: Performance Comparison of Proposed work with Earlier works

Works PUMA [7] MAMA[16] Yerima et al.[31] Peiravian et al. [32] Peiravian+ # et al.[32] Milosevicetal# .[37] PIndroid∗ [20] FAMOUS

Accuracy (%) 86.41 87.41 92.1 93.60 96.39 89.4 99.8 94.84

FPR (%) 0.19 0.05 0.06 0.92 94.9 89.5 0.011 0.95

TPR (%) 0.91 0.80 0.90 0.88 94.1 84.4 99.8 0.95

AUC 0.92 0.95 97.23 0.95 0.991 99.8 0.94

Algorithm RF50 RF100 Bayesian Bagging Bagging Ensemble Ensemble RF

Note: +: Permission-based features used along with API calls. ∗:Permission-based features used along with Intents-based features. #: Instead of FPR, TPR,and AUC columns show precision, recall and F-Score. Table 8: Comparison of Computational Performance

Classifiers RF DT NB kNN SVM AdaBoost

Training(Sec.) Boolean EMSP 13.476 13.027 0.094 0.098 0.065 0.055 0.180 0.179 24.200 25.432 11.426 11.702

[4] P. Faruki, V. Ganmoor, V. Laxmi, M. S. Gaur, A. Bharmal, Androsimilar: robust statistical feature signature for android malware detection, in: Proceedings of the 6th International Conference on Security of Information and Networks, ACM, 2013, pp. 152–159. [5] M. Zheng, M. Sun, J. C. Lui, Droid analytics: a signature based analytic system to collect, extract, analyze and associate android malware, in: 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, IEEE, 2013, pp. 163–171. [6] R. Sato, D. Chiba, S. Goto, Detecting android malware by analyzing manifest files, Proceedings of the Asia-Pacific Advanced Network 36 (2013) 23–31. [7] C.-Y. Huang, Y.-T. Tsai, C.-H. Hsu, Performance evaluation on permission-based detection for android malware, in: Advances in Intelligent Systems and Applications-Volume 2, Springer, 2013, pp. 111–120. [8] B. Sanz, I. Santos, C. Laorden, X. Ugarte-Pedrero, P. G. Bringas, ´ G. Alvarez, Puma: Permission usage to detect malware in android, in: International Joint Conference CISIS12-ICEUTE´ 12-SOCO´ 12 Special Sessions, Springer, 2013, pp. 289–298. [9] D.-J. Wu, C.-H. Mao, T.-E. Wei, H.-M. Lee, K.-P. Wu, Droidmat: Android malware detection through manifest and api calls tracing, in: Information Security (Asia JCIS), 2012 Seventh Asia Joint Conference on, IEEE, 2012, pp. 62–69. [10] M. S. Alam, S. T. Vuong, Random forest classification for detecting android malware, in: Green Computing and Communications (GreenCom), 2013 IEEE and Internet of Things (iThings/CPSCom), IEEE International Conference on and IEEE Cyber, Physical and Social Computing, IEEE, 2013, pp. 663–669. [11] P. P. Chan, W.-K. Song, Static detection of android malware by using permissions and api calls, in: 2014 International Conference on Machine Learning and Cybernetics, Vol. 1, IEEE, 2014, pp. 82–87. [12] H.-Y. Chuang, S.-D. Wang, Machine learning based hybrid behavior models for android malware analysis, in: Software Quality, Reliability and Security (QRS), 2015 IEEE International Conference on, IEEE, 2015, pp. 201–206. [13] M. Ghorbanzadeh, Y. Chen, Z. Ma, T. C. Clancy, R. McGwier, A neural network approach to category validation of android applications, in: Computing, Networking and Communications (ICNC), 2013 International Conference on, IEEE, 2013, pp. 740–744. [14] W. Glodek, R. Harang, Rapid permissions-based detection and analysis of mobile malware using random decision forests, in: MILCOM 2013-2013 IEEE Military Communications Conference, IEEE, 2013, pp. 980–985. [15] H.-S. Ham, M.-J. Choi, Analysis of android malware detection performance using machine learning classifiers, in: 2013 International Conference on ICT Convergence (ICTC), IEEE, 2013, pp. 490–495. [16] D. Arp, M. Spreitzenbarth, M. Hubner, H. Gascon, K. Rieck, Drebin: Effective and explainable detection of android malware in your pocket., in: 21th Annual Network and Distributed System Security Symposium (NDSS), 2014. [17] B. Sanz, I. Santos, C. Laorden, X. Ugarte-Pedrero, J. Nieves, P. G. ´ Bringas, G. Alvarez Mara˜no´ n, Mama: manifest analysis for malware detection in android, Cybernetics and Systems 44 (6-7) (2013) 469–488. [18] A. Shabtai, Y. Fledel, Y. Elovici, Automated static code analysis for clas-

Testing(Sec.) Boolean EMSP 0.882 0.906 0.003 0.006 0.025 0.026 2.009 1.573 8.382 8.941 0.709 0.745

• A machine learning based predictive classifier was built to detect maliciousness of Android applications. • A forensic tool entitled FAMOUS was built with machine learning classifier which will assist the analyst in triaging unknown and zero-day Android malware without the prerequisite of signature. Currently, FAMOUS is using only application’s permissionbased feature which shall be complemented with other static features such as strings, function calls and application’s metadata in future work. Dynamic features are not considered in the current version of the work which shall be added as a future direction. An ensemble of classifiers give better accuracy than individual classifier, so future work will exploit ensemble techniques (ensemble classifiers built with static and dynamic features) to improve the classification accuracy further. FAMOUS implementation was a prototype that was built as a proof-ofconcept of the proposed model. Future work will also focus on developing FAMOUS with an industrial standard and complying with complete digital forensic guidelines. References [1] Y. Zhou, X. Jiang, Dissecting android malware: Characterization and evolution, in: Security and Privacy (SP), 2012 IEEE Symposium on, IEEE, 2012, pp. 95–109. [2] W. J. Buchanan, S. Chiale, R. Macfarlane, A methodology for the security evaluation within third-party android marketplaces, Digital Investigation. [3] C. J. DOrazio, K.-K. R. Choo, Circumventing ios security mechanisms for apt forensic investigations: A security taxonomy for cloud apps, Future Generation Computer Systems.

16

Table 9: FAMOUS:Scanning results on four use cases devices

Devices (Model&Version) GalaxyJ1-4.4.4 Micromax-A096-5.0.2 Lenovo-K50-5.0 Galaxy-SM-5.0.2

[19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29]

[30] [31]

[32]

[33]

[34] [35] [36] [37]

Total Apks 166 153 249 233

Pulling time(Mins.) 13.58 12.05 15.62 14.11

sifying android applications using machine learning, in: Computational Intelligence and Security (CIS), 2010 International Conference on, IEEE, 2010, pp. 329–333. W. Wang, Y. Li, X. Wang, J. Liu, X. Zhang, Detecting android malicious apps and categorizing benign apps with ensemble of classifiers, Future Generation Computer Systems. A. Feizollah, N. B. Anuar, R. Salleh, A. W. A. Wahab, A review on feature selection in mobile malware detection, Digital Investigation 13 (2015) 22–37. S.-H. Seo, A. Gupta, A. M. Sallam, E. Bertino, K. Yim, Detecting mobile malware threats to homeland security through static analysis, Journal of Network and Computer Applications 38 (2014) 43–53. F. Marturana, S. Tacconi, A machine learning-based triage methodology for automated categorization of digital media, Digital Investigation 10 (2) (2013) 193–204. F. Di Cerbo, A. Girardello, F. Michahelles, S. Voronkova, Detection of malicious applications on android os, in: Computational Forensics, Springer, 2010, pp. 138–149. J. Clemens, Automatic classification of object code using machine learning, Digital Investigation 14 (2015) S156–S162. A. P. Felt, E. Chin, S. Hanna, D. Song, D. Wagner, Android permissions demystified, in: Proceedings of the 18th ACM conference on Computer and communications security, ACM, 2011, pp. 627–638. D. Geneiatakis, I. N. Fovino, I. Kounelis, P. Stirparo, A permission verification approach for android mobile applications, Computers & Security 49 (2015) 192–205. Z. Fang, W. Han, Y. Li, Permission based android security: Issues and countermeasures, computers & security 43 (2014) 205–218. V. Moonsamy, J. Rong, S. Liu, Mining permission patterns for contrasting clean and malicious android applications, Future Generation Computer Systems 36 (2014) 122–132. B. Sanz, I. Santos, C. Laorden, X. Ugarte-Pedrero, P. G. Bringas, On the automatic categorisation of android applications, in: Consumer Communications and Networking Conference (CCNC), 2012 IEEE, IEEE, 2012, pp. 149–153. X. Wang, W. Wang, Y. He, J. Liu, Z. Han, X. Zhang, Characterizing android apps behavior for effective detection of malapps at large scale, Future Generation Computer Systems 75 (2017) 30–45. A. A. A. Samra, K. Yim, O. A. Ghanem, Analysis of clustering technique in android malware detection, in: Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS), 2013 Seventh International Conference on, IEEE, 2013, pp. 729–733. S. Y. Yerima, S. Sezer, G. McWilliams, I. Muttik, A new android malware detection approach using bayesian classification, in: Advanced Information Networking and Applications (AINA), 2013 IEEE 27th International Conference on, IEEE, 2013, pp. 121–128. N. Peiravian, X. Zhu, Machine learning for android malware detection using permission and api calls, in: Tools with Artificial Intelligence (ICTAI), 2013 IEEE 25th International Conference on, IEEE, 2013, pp. 300– 305. N. Milosevic, A. Dehghantanha, K.-K. R. Choo, Machine learning aided android malware classification, Computers & Electrical Engineering. F. Idrees, M. Rajarajan, M. Conti, T. M. Chen, Y. Rahulamathavan, Pindroid: A novel android malware detection system using ensemble learning methods, Computers & Security 68 (2017) 36–46. K. A. Talha, D. I. Alper, C. Aydin, Apk auditor: Permission-based android malware detection system, Digital Investigation 13 (2015) 1–14. M. Parkour, contagio (2016). URL http://contagiodump.blogspot.com.

Scanning time(Sec.) 2.67 2.54 4.25 3.10

Suspicious Apks 13 17 26 24

[38] G. Dini, F. Martinelli, I. Matteucci, M. Petrocchi, A. Saracino, D. Sgandurra, Risk analysis of android applications: a user-centric solution, Future Generation Computer Systems. [39] F. Tchakount´e, Permission-based malware detection mechanisms on android: Analysis and perspectives, JOURNAL OF COMPUTER SCIENCE 1 (2). [40] N. Viennot, E. Garcia, J. Nieh, A measurement study of google play, in: ACM SIGMETRICS Performance Evaluation Review, Vol. 42, ACM, 2014, pp. 221–233. [41] B. R. Team, et al., Sanddroid: An apk analysis sandbox. xian jiaotong university (2014). [42] F. Maggi, A. Valdi, S. Zanero, Andrototal: a flexible, scalable toolbox and service for testing mobile malware detectors, in: Proceedings of the Third ACM workshop on Security and privacy in smartphones & mobile devices (SPSM), ACM, 2013, pp. 49–54. [43] V. VirusTotal, Virustotal - free online virus, malware and url scanner (2004). URL https://www.virustotal.com/ [44] A. Desnos, androguard/androguard (2012). URL https://github.com/androguard/androguard [45] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: Machine learning in Python, Journal of Machine Learning Research 12 (2011) 2825–2830.

Appendix A. Values of PuB, PuM,MSP, BSP and EMSP The permissions option prints the permissions required by the application (referred with the package name defined in the Android manifest file). Please notice that only explicit permissions declared in the manifest will be listed by this command.

17

Table A.10: PuB,PuM,BSP,MSP and EMSP values of Top 26-50 sorted on malware values

permission READ LOGS DISABLE KEYGUARD GET ACCOUNTS CHANGE NETWORK STATE SYSTEM ALERT WINDOW READ EXTERNAL STORAGE DELETE PACKAGES MOUNT UNMOUNT FILESYSTEMS PROCESS OUTGOING CALLS CAMERA RECEIVE WAP PUSH BLUETOOTH RECEIVE MMS BLUETOOTH ADMIN CHANGE CONFIGURATION UPDATE DEVICE STATS ACCESS COARSE UPDATES KILL BACKGROUND PROCESSES MODIFY PHONE STATE RECORD AUDIO DELETE CACHE FILES ACCESS CACHE FILESYSTEM ACCESS GPS FLASHLIGHT ACCESS LOCATION

PuM 522 471 447 403 345 333 270 240 231 229 220 206 186 184 183 166 163 147 128 124 119 114 111 101 100

18

PuB 158 126 1271 82 202 447 10 111 68 2 4 125 7 81 83 8 5 56 12 502 7 1 41 169 33

BSP 0.0272 0.0217 0.2185 0.0141 0.0347 0.0768 0.0017 0.0191 0.0117 0.0003 0.0007 0.0215 0.0012 0.0139 0.0143 0.0014 0.0009 0.0096 0.0021 0.0863 0.0012 0.0002 0.007 0.029 0.0057

MSP 0.094 0.0848 0.0805 0.0726 0.0621 0.06 0.0486 0.0432 0.0416 0.0412 0.0396 0.0371 0.0335 0.0331 0.033 0.0299 0.0294 0.0265 0.0231 0.0223 0.0214 0.0205 0.02 0.0182 0.018

EMSP 0.0668 0.0631 -0.138 0.0585 0.0274 -0.0168 0.0469 0.0241 0.0299 0.0409 0.0389 0.0156 0.0323 0.0192 0.0187 0.0285 0.0285 0.0169 0.021 -0.064 0.0202 0.0203 0.013 -0.0108 0.0123

Author Biographies Mr. Ajit Kumar is a Phd Research Scholar in the Department of Computer Science at Pondicherry Central University, Puducherry, India. He has completed his Masters degree in Computer Science in the year 2011 and Bachelors degree in Computer Application in year 2009. His research interest includes Cyber Security, Malware Classification and Machine Learning. He has published 3 research articles in International journals and 6 papers in International Conference related to Malware and Machine Learning. Dr. K. S. Kuppusamy is an Assistant Professor of Computer Science at Pondicherry Central University, India. He has received his Ph.D in Computer Science and Engineering in the year 2013 and his masters degree in Computer Science and Information Technology in the year 2005. His research interest includes Accessible Computing, Security and accessibility. He has published more than 25 papers in various international journals and conferences. He is the recipient of Best Teacher award during the years 2010, 2011 2013, 2015 and 2016. Dr. G. Aghila is Professor at Department of Computer Science and Engineering, National Institute of Technology Puducherry, Karaikkal. She has got a total of 26 years of teaching experience. She has received her M.E (Computer Science and Engineering) and Ph.D. from Anna University, Chennai, India. She has published nearly 70 research papers in web crawlers and ontology based information retrieval. She has successfully guided 7 Ph.D. scholars. She was in receipt of Schrneiger award. She is an expert in ontology development. Her area of interest includes Intelligent Information Management, Artificial intelligence, Text mining and Semantic web technologies.

1

• Presents statistical properties of permissions pattern, size, file count in malicious and benign Android applications. • Provides score based feature set built with permissions present in malware and benign Android applications. • A machine learning based predictive classifier is built to detect maliciousness of Android applications. • A forensic GUI tool FAMOUS is built with best performing machine learning classifier to triage suspicious applications.

1

FAMOUS: Forensic Analysis of MObile devices Using Scoring of application permissions

FAMOUS: Forensic Analysis of MObile devices Using Scoring of application permissions

Recommend Documents