computers & security 68 (2017) 47–68
Available online at www.sciencedirect.com
ScienceDirect j o u r n a l h o m e p a g e : w w w. e l s e v i e r. c o m / l o c a t e / c o s e
Security importance assessment for system objects and malware detection Weixuan Mao a, Zhongmin Cai a,*, Don Towsley b, Qian Feng c, Xiaohong Guan a a
Key Laboratory for Intelligent Networks and Network Security, Xi‘an Jiaotong University, Xi’an, China College of Information and Computer Sciences, University of Massachusetts, Amherst, MA, USA c Department of Electrical Engineering & Computer Science, Syracuse University, Syracuse, NY, USA b
A R T I C L E
I N F O
A B S T R A C T
Article history:
System objects play different roles in computer systems and exhibit different levels of
Received 23 July 2016
importance to system security. Assessing the importance of system objects helps us
Received in revised form 23
develop effective security protection methods. However, little work has focused on under-
December 2016
standing and assessing the importance of system objects from a security perspective. In
Accepted 17 February 2017
this paper, we build a security dependency network from access behaviors to quantify the
Available online 27 March 2017
security importance of system objects from a system-wide perspective. Similar to other networked systems, we observe small-world effect and power-law distributions for in-
Keywords:
and out-degree in the security dependency network. Exploring rich network structures
Importance metric
in the security dependency network provides insights into the importance of system
Access behavior
objects in security. We assess the importance of system objects, with respect to security,
Security dependency network
by the centrality metrics and propose an importance based model for malware detection.
Malware detection
We evaluate importance metrics of system objects from various perspectives to demon-
Behavioral-based detection
strate their feasibility and practicality. Furthermore, extensive experimental results on a real-world dataset demonstrate that our model is capable of detecting 7257 malware samples from 27,840 benign processes with a 93.92% true positive rate at 0.1% false positive rate. © 2017 Elsevier Ltd. All rights reserved.
1.
Introduction
Protection of confidentiality, integrity and availability in a real computer system is a challenging task. The increasing number of system objects and complex interactions between them make fine-grained and accurate protection of each system object time consuming and error prone. One of the main difficulties in protecting real operating systems is the lack of understanding of the role of system
* Corresponding author. E-mail addresses:
[email protected] (Z. Cai). http://dx.doi.org/10.1016/j.cose.2017.02.009 0167-4048/© 2017 Elsevier Ltd. All rights reserved.
objects with respect to security, in the face of hundreds of thousands of system objects in real operating systems. Previous studies divided system objects into groups based on experience, such as grouping by integrity levels, or file directories, and devised policies for each group of system objects (Bhatkar et al., 2006; Fattori et al., 2015; Lanzi et al., 2010; Sun et al., 2008; Sze and Sekar, 2015). However, such experience based group approaches are still coarse in securing real operating systems. Moreover, different systems have different security requirements. It is insufficient and inflexible to devise security policies
48
computers & security 68 (2017) 47–68
for system protection relying heavily on human knowledge and involvement (Bell and LaPadula, 1973; Biba, 1977; Fraser, 2000; Lanzi et al., 2010; MICROSOFT, 2014). In such a context, a systematic and automatic security importance assessment for system objects is desirable and necessary. In a real system, the security of one object usually depends on that of others, which leads to their different levels of security importance. We refer to the security importance of a system object as the impact of the compromise of a security attribute of the object on the security attribute of others. For example, the compromise of ntdll.dll will undermine the integrity of a large number of processes that load the dynamic link library (dll) file. WHILE the compromise of zlib.dll will ONLY undermine the integrity of the data compression process. Thus, ntdll.dll has a greater impact and is more important than zlib.dll with respect to integrity. The community will greatly benefit from the knowledge of security importance of system objects. First, it will provide us a basis for devising fine-grained security policies to protect those most important objects. Second, it will help us assess the impact of a security accident by the importance level of the infected objects. Third, it will provide us a way of discriminating malicious processes from benign ones by examining their behaviors with security policies of security models, such as the Biba model (Biba, 1977). However, little previous work focuses on systematically assessing the importance of system objects from the security perspective. We observe that the interactions between various system objects such as processes, files and registries lead to dependencies between their security attributes. For examples, a read operation makes the integrity of a process depend on that of a file. The integrity of the file may further depend on processes who write to it. As a result, the dependencies between security attributes of various system objects in a real system are interwoven. Different objects have different levels of security importance due to the different roles they undertake in the running of a computer system. This paper develops a networked approach to assess the security importance of system objects from a system-wide perspective. We construct a dependency network of system objects from observed access behaviors, and analyze their security importance utilizing information encoded in various network structures. Inspired by these observations and analyses, we then leverage several centrality metrics from network science (Newman, 2010) to assess the importance of system objects in the dependency network. Furthermore, we propose a novel importance based malware detection method as a validation of the proposed importance metrics. We construct importance metric based behavioral profiles to characterize access behaviors of processes, and leverage statistical learning techniques to discriminate malicious processes from benign ones without relying on human knowledge and involvement. Our experimental results demonstrate that our approach is capable of detecting 7257 malware samples from 27,840 benign processes at 93.92% true positive rate under 0.1% false positive rate, and further indicate the feasibility of our networked approach to importance assessment. Our earlier work takes a preliminary step toward quantifying importance of system objects (Mao et al., 2014), in which we focus on developing new methods for malware detection based on PageRank centrality. However, there are still four important questions left unanswered:
• What are characteristics of system objects and their interactions from a connected and system-wide perspective? • What are connections between network structures and importance of system objects in security? • How to comprehensively evaluate a security importance assessment for system objects? • Why is it feasible to detect malware based on security importance assessment for system objects? In this paper, we extend our earlier work by answering the above questions. In summary, the contributions and benefits of this paper are as follows. • In-depth analysis of system-wide access behaviors via networked modeling. We construct a dependency network to encode security dependencies among process, file, and registry objects based on their interactions observed in access behaviors. This dependency network provides a system-wide view of the security dependencies between system objects. Meanwhile, we observe that this dependency network exhibits power-law degree distributions and small-world effect properties, which demonstrate the existence of system objects with different levels of security importance. • Network structure based importance assessment with respect to integrity. We define the importance of a system object with respect to integrity as the amount of influence it could have on the integrity of other system objects, if this object is compromised. Under this definition, we study the importance of system objects with respect to integrity within in-star, out-star, and chain structures. The security importance of a system object is quantified in terms of centrality, such as in-degree centrality, authority centrality, and PageRank centrality. We perform extensive evaluations of the importance metrics by investigating the convergence of importance metric over time, and conducting various analyses on ranking positions of system objects under the importance metric to demonstrate the feasibility. • Importance metric based malware detection. According to the classical integrity policies, “no read down” and “no write up” (Biba, 1977), we study the behavioral differences between benign and malicious processes under different importance metrics with respect to integrity. The observed behavioral differences demonstrate the feasibility of importance metric based malware detection. We leverage statistical learning techniques to discriminate malicious processes from benign ones without relying on human knowledge and involvement. Performance analysis of our malware detection method further demonstrates the feasibility of our networked approach to security importance assessment. The remainder of this paper is organized as follows. We review related work in Section 2. Section 3 presents data dependency relationships and the dependency network. Before presenting our approach to malware detection in Section 5, we introduce centrality based importance metrics in Section 4. Section 6 presents our experiments and evaluation results. Section 7 discusses possible extensions and limitations. Finally, Section 8 concludes.
computers & security 68 (2017) 47–68
2.
Related work
2.1.
Complex network in computer system
In both realms of biology and engineering, it is common to characterize systems by their intricate networked organizations. The underlying structures of these networks have been studied aiming to explore statistical properties, analyze physical meanings, and predict future behavior. Recently, the existence of networked structures in computer systems, such as software system, file system, operating system, etc., has attracted a growing number of interests (Concas et al., 2007; Hatton, 2009; Myers, 2003; Yan et al., 2010). Klemm et al. studied the topological structure of the directory-file storing network (Klemm et al., 2005), and proposed a growth model to understand the principle of storing behaviors on file systems (Klemm et al., 2006). Yan et al. (2010) characterized the call graph of Linux kernel as a directed network, and studied its topology and evolution. By comparing to genomes, they demonstrated the differences between biological systems and software systems in their design principles. There exists literature on analyzing data flow graph of a single process or multiple processes (Agrawal, 1999; Fredrikson et al., 2010; Kwon et al., 2015; Wüchner et al., 2014, 2015). Previous works on data flow graph of a single process did not provide understandings of system-wide access behaviors (Agrawal, 1999; Fredrikson et al., 2010). Kwon et al. (2015) constructed a downloader graph which encoded the download behavior between executable files. The downloader graph only focused on the objects which were involved in download behaviors. Wüchner et al. (2014) proposed a tool for visualizing system level activities by constructing data flow graphs. However, visualization is not enough to provide insights into the system-wide activities. Importance assessment has been studied in social networks, co-authorship networks, and system call graphs from the perspectives of system stability (Sahinoglu, 2005), reliability (Borgonovo, 2007), and connectivity (Tong et al., 2010). However, there is little previous work focusing on the systematic assessment for object importance in a computer system from the perspective of security. This paper focuses on characterizing the system objects and their activities, and assessing the security importance of system objects. We investigate the system-wide security dependency network based on network science, and provide interesting insights which have not been touched by previous works.
2.2.
System protection
Previous works on system protection relied on devising security policies, with respect to data flow analysis or access behaviors, for system objects (Fattori et al., 2015; Sun et al., 2008; Sze and Sekar, 2013, 2015). Previous studies divided system objects into groups based on experience, such as grouping by integrity levels, or file directories, and devised policies for each group of system objects (Bhatkar et al., 2006; Fattori et al., 2015; Lanzi et al., 2010; Sun et al., 2008; Sze and Sekar, 2015; Xuan et al., 2009). For example, Sun et al. (2008) defined two integrity
49
levels and monitored the violation of integrity policies to defend against malware. They manually devised policies and determined the integrity level of objects according to their experience. Lanzi et al. (2010) built an access activity model enforced by heuristic policies to protect file system and registry. Xuan et al. (2009) manually devised 19 rules to conduct rootkit prevention via data flow analysis. Such techniques usually heavily rely on human experiences and efforts, which are inflexible in the face of different security requirements of different systems. In the face of hundreds of thousands of system objects, various types of access behaviors, and dynamic conditions in a real system, it is time consuming and error prone to carefully devise security policies for each system object, such as which processes can load, modify, or delete a file under what conditions. In contrast to previous works, we propose a centrality based importance metric with respect to integrity for system objects, which quantifies the importance of objects with respect to integrity automatically and systematically. Our importance metric provides administrators with a basis for prioritizing human efforts to devise fine-grained security policies of system objects as a proactive protection.
2.3.
Behavior based malware detection
Taking access behaviors into account, our importance metric based malware detection is related to dynamic behavior based malware detection as well. Previous works built behavioral profiles of programs and employed statistical learning algorithms to differentiate malware from benign programs (Apap et al., 2002; Heller et al., 2003; Jang et al., 2014; Wüchner et al., 2014, 2015). In system call level, n-gram is a classical profile of programs (Forrest et al., 1996). Canali et al. (2012) conducted a quantitative study on the accuracy of different system call/action based malware detection models. However, those techniques only relied on the statistically behavioral discriminations between benign and malicious processes, which neglected the semantic of the behaviors of processes. Monitoring the process execution by Qemu, Martignoni et al. (2008) proposed a behavior graph based malware detector. Fredrikson et al. (2010) extracted graphical profiles for each program individually, and analyzed the behavior graph with a graph mining algorithm, to discriminate malware from benign programs. Wüchner et al. (2015) computed various graph features under the behavior graph of a process, and leveraged random forests to detect malware. Although those techniques explored the semantic meaning of the behaviors from a perspective of behavioral graph, their behavioral graphs were local and individual for each single process, which still neglected the security meaning of the behaviors of processes. We notice that the essential difference between benign and malicious processes is the violation of security policies with respect to the security attribute of system objects, such as integrity or confidentiality. Our importance metric based approach naturally combines security policies with the statistically behavioral discriminations to detect malicious programs.
50
3.
computers & security 68 (2017) 47–68
Security dependency network
This paper explores a networked approach to assess the security importance of system objects from a system-wide perspective. The interactions between various system objects lead to dependencies between their security attributes. These dependencies further constitute a system-wide dependency network when we characterize the relationships between pairs of system objects.
3.1.
Security dependency relationship
We refer to a system call of accessing resources as an access event, such as reading or writing files or registries. Access behaviors of a process are the sequence of access events during its execution. Each access event associates with a pair of system objects, and leads to data flow either between process and file, or between process and registry (King and Chen, 2005). A data flow from one system object to another gives rise to the dependency between the security attributes of the two objects. For example, a read operation generates a data flow from a file to a process and makes the integrity of the process depend on that of the file. On the other hand, this data flow also makes the confidentiality of the file depend on that of the process. Thus, we have the following three definitions. Definition 3.1. Security dependency relationship: If the data flow makes a security attribute of an object a depend on that of an object b, then there is a security dependency relationship from a to b, which is denoted as a ⇒ b. Definition 3.2. Integrity dependency relationship: If the data flow makes integrity of an object a depend on that of an object b, then there is an integrity dependency relationship from a to Int
For file-process events, we divide them into two types, read and write. For a read event, such as QueryInformation or ReadFile, the direction of the data flow is from a file to a process. The data flow makes the integrity of the process depend on that of the file. Thus, the integrity dependency relationship is process ⇒ file. On the other hand, for a write event, such as SetInformation or WriteFile, the direction of the data flow is from process to file. The data flow makes the integrity of the file depend on that of the process. Thus, the integrity dependency relationship is process ⇐ file. For the process-process event ProcessCreate, the process usually forks a child process with some arguments, or default arguments. There is a data flow from the parent process to the child process considering the passing arguments. More precisely, the parent process passes arguments to the executable file of the child process. Meanwhile, the parent process achieves some goal or receives returning values by executing the child process. Thus, the data flows between the parent process and the child process imply two integrity dependency relationships as follows: a) Child process ⇐ executable file of parent process. b) Parent process ⇐ executable file of child process.
3.1.2. Security dependency relationships between registry and process We divide registry related access events into two types, read and write. For a read event, such as RegQueryValue or RegEnumValue, the data flow is from a registry to a process. The data flow makes the integrity of the process depend on that of the registry. Thus, process ⇒ registry. On the other hand, for a write event, such as RegCreateValue, RegDeleteValue or RegSetValue, the data flow is from a process to a registry. The data flow makes the integrity of the registry depend on that of the process. Thus, process ⇐ registry.
3.2.
Security dependency network
b, which is denoted as a ⇒ b . Definition 3.3. Confidentiality dependency relationship: If the data flow makes confidentiality of an object a depend on that of an object b, then there is a confidentiality dependency relationConf
ship from a to b, which is denoted as a ⇒ b . In this paper, we focus on assessing importance of system objects with respect to integrity, and discuss importance assessment with respect to confidentiality as an extension in Section 7. For simplicity, in the rest of paper, we refer to security dependency as integrity dependency. And a ⇒ b indicates the integrity dependency relationship from a to b.
3.1.1.
Dependency relationships between file and process
For every process, there is an integrity dependency relationship between it and its executable file, the process ⇒ the executable file. Moreover, we divide file related access events into two categories. One is file-process, which includes events such as CreateFile, QueryInformation, ReadFile, WriteFile, and so on. The other is process-process, which includes ProcessCreate.
We encode system objects into vertices of a graph, and their dependency relationships into directed edges of the graph. More accurately, every distinct file, registry or process object corresponds to a vertex; every distinct dependency relationship corresponds to a directed edge between vertices. We refer to the graph as a security dependency network. According to our definition of security dependency relationships between system objects, we have formal definitions of two security dependency networks as follows. Definition 3.4. (File-Process Security Dependency Network). A directed graph G(V, U, E): • V is a set of process objects distinguished by process name. • U is a set of file objects distinguished by path. • E = {e(n1, n2) | n1 ∈ V, n2 ∈ U, or, n1 ∈ U, n2 ∈ V}, e(n1, n2) denotes a security dependency relationship n1 ⇒ n2, which indicates n1 depends on n2. Definition 3.5. (Registry-Process Security Dependency Network). By substituting the set of registry objects into U in Definition 3.4, we get the definition of registry-process security dependency network.
computers & security 68 (2017) 47–68
4.
Security importance assessment
This section starts with illustrating the connection between the dependency relationship and the importance of system objects in security. And then, we present our importance assessments by investigating network structures in the dependency network. According to Wikipedia, importance is an indicator of significance or value from some perspective (Wikipedia, 2015). In this paper, we refer to the security importance of a system object as the impact of the compromise of a security attribute of this object on the security attribute of other system objects. Confidentiality, integrity, and availability (CIA) are three fundamental security attributes for the protection of a system. In system protection, securing integrity of the system is often regarded as a necessarily first step for the securing of confidentiality and availability (Fraser, 2000; Mao et al., 2011; MICROSOFT, 2014; Sun et al., 2008; Sze and Sekar, 2013; Vijayakumar et al., 2012). This paper focuses on assessing the importance of system objects with respect to integrity, and discuses extension to importance assessment with respect to confidentiality in Section 7.
4.1.
Importance in integrity
Definition 4.1. (Importance in Integrity). For a system object, its importance in integrity is the amount of influence it could have on the integrity of other objects, if its integrity is compromised.
Fig. 2 – In-star structures in the file-process dependency network.
objects. There are some interesting basic structures in the network, which shed light on the importance of system objects with respect to security.
4.2.1.
Networked structures and importance in integrity
The security dependency network is large and complex. It contains all observed security dependencies between system
Fig. 1 – Access behaviors and data dependency relationships between a file f and a process p.
In-star structure
We first explore the in-star structure of a file f and a process p in the dependency network, as shown in Fig. 2. More dependencies on an object implies more importance, because of more potential influence on others if it is compromised. Thus, we can quantify the importance of f and p by counting the number of processes that reads, a.k.a. depends on, f, and counting the number of files that p writes to, a.k.a. is depended on, respectively. These numbers are in-degrees of vertices in the security dependency network. This metric for importance assessment is called degree centrality, more accurately, in-degree centrality, in network science (Newman, 2010). Thus, in-degree centrality provides a security importance metric with respect to integrity for system objects in in-star structures.
4.2.2. Fig. 1 illustrates reading and writing behaviors between a file and a process to explain the influence on integrity. In Fig. 1a, the reading event makes the integrity of p depend on that of f. If the integrity of f is compromised, the integrity of p will be threatened since it will read untrusted data from f. Meanwhile, in Fig. 1b, the writing event makes the integrity of f depend on that of p. If the integrity of p is compromised, then the integrity of f will be threatened since the untrusted data would be written to f by p. Similar observations can be found between registry and process. The security dependency network provides information of the scope of the impact, and how other objects will be influenced when the integrity of a system object is compromised. Thus, we utilize the security dependency network as a basis for assessing the importance of system objects in security, and propose a networked approach to conduct importance assessment.
4.2.
51
Out-star structure
There exist limitations in the importance assessment via counting the in-degree of system objects. The importance assessment for system objects should not only consider the number of its in-neighbors, but also take into account the information carried by its neighbors. In Fig. 3a, if a process p reads many important files, such as f1, f2, ⋯, fn−1 are known to be important, then fn, which is also read by p, is more likely to be important. Here, p serves as an indicator whether its reading objects are important or not. With respect to writing behaviors as shown in Fig. 3b, if a file f is written by many important processes, such as p1, p2, ⋯, pn−1 are known to be important, then pn, which also writes to f, is more likely to be important. Here, f serves as an indicator whether processes which write to it are important or not. Based on these observations, we are able to take advantage of authority centrality to quantify the importance of system objects with respect to integrity. Authority centrality is defined when Kleinberg proposed an algorithm, named hyperlink-induced topic search (HITS), to discover authoritative web pages (Kleinberg, 1999). HITS also defines a hub centrality to work with authority centrality. The intuition behind HITS is, a good hub is a page that points to many good authorities, while a good authority is a page that is pointed to by many good hubs. In our problem, as shown in Fig. 3a, the hub centrality of p indicates the level that p reads many important files, while the authority centrality of a file fi indicates its importance with respect to the influence on other processes, if the file fi is compromised. Working with hub centrality, authority centrality provides an importance metric for system objects in out-start structures with respect to integrity. Moreover, hub centrality aggregates the information about
52
computers & security 68 (2017) 47–68
Fig. 3 – Out-star structures in the file-process dependency network.
the importance of out-neighbors, and passes the information back to the out-neighbors, which helps to share the similarity in importance among 2-hop neighbors, such as between f1 and fn, and between p1 and pn, in Fig. 3. Formally, the authority centrality x and the hub centrality y are calculated iteratively until convergence as follows:
x(t + 1) = Ay(t ),
y(t + 1) = AT x(t ),
where A is the adjacency matrix of the dependency network G.
4.2.3.
dependency. Fig. 4b exhibits the chain of impact propagation when the integrity of a process p1. The direction of the propagation is in reverse to the direction of integrity dependency as well. Thus, for importance assessment, we need to aggregate the importance of objects along the chain of impact propagation. Such an approach to measure importance is called PageRank centrality in network science (Newman, 2010). PageRank is initially proposed to figure out important web pages (Brin and Page, 1998). The intuition behind the PageRank centrality is, a web page is important if it is pointed by many important pages. In our problem, an object is important because the security attribute of many important objects depends on it. As shown in Fig. 4, the existence of important object on the chain of impact propagation makes the object on the starting point of the chain important. Thus, the PageRank centrality provides an importance metric for system objects in chain structures with respect to integrity. Let A denote the adjacency matrix of the dependency network, the PageRank centrality q is the stationary point satisfying
q=
Chain structure
Furthermore, we explore the chain structure in the security dependency network, which indicates the propagation of the impact caused by a compromised object. In this circumstance, we consider the importance of both direct and indirect neighbors to assess the importance of a system object. More accurately, the neighbors are direct and indirect in-neighbors in the security dependency network. Fig. 4 illustrates an example for the chain structure in the file-process security dependency network. Fig. 4a shows the chain of impact propagation when the integrity of a file f1 is compromised. Note that the direction of the propagation is in reverse to the direction of integrity
Fig. 4 – Chain structures in the file-process dependency network.
D=
(1 − c) n
1⎤ ⎡⎡ 1 ⎤ 1 + c ⎢ ⎢ , … , ⎥ A + D ⎥ q, dn ⎦ ⎣ ⎣ d1 ⎦
1 [1 (d1 = 0) , … , 1 (dn = 0)]T 1T, n
(1)
(2 )
where c is a damping factor indicating the restart in random walk (Brin and Page, 1998). dj is the out-degree of the object j. D is a matrix representing the dangling objects whose outdegree are zero. 1 (di = 0) is an indicator function, 1 (di = 0) = 1 if di = 0. Otherwise, 1 (di = 0) = 0 . Equivalently, in matrix notation, it can be expressed as
q = Mq,
1⎤ 1 − c T ⎡1 11 . M = c ⎢ , … , ⎥ A + cD + dn ⎦ n ⎣ d1
(3)
The random walk with restart behind the PageRank centrality guarantees a stationary distribution, which indicates a unique solution. Fig. 5 illustrates an example of PageRank centrality for a small file-process dependency network. Fig. 5a shows the original file-process dependency network with the adjacent matrix A, while Fig. 5b shows the “augmented” dependency network with the transition matrix M for calculating the PageRank centrality. In Fig. 5b, nodes and solid lines correspond to the first term in Eq. (3), nodes and dashed lines correspond to the term cD, and nodes and dotted lines correspond to the last term in
Fig. 5 – An example of calculating PageRank centrality.
computers & security 68 (2017) 47–68
53
security policies in security models, such as the Biba model (Biba, 1977). In this section, we present an importance metric based malware detection which leverages information from both security policies and statistical discriminations between benign and malicious processes. Fig. 6 – Clique structures in the file-process dependency network.
5.1.
Eq. (3). The transition probability of solid lines are shown in the figure. The transition probability for dashed lines are c/5, while that for dotted lines are (1 − c)/5. We are able to calculate the PageRank centrality iteratively via q (t + 1) = Mq (t ) until the convergence. c is a hyper-parameter, and 1 − c indicates the restarting probability of the random walk. We tune it via cross validations. In our experiments, we observe that there is no significant difference in results of malware detection when c is larger than or equal to 0.8.
The threat model we adopt in this paper considers a malware or an attack which compromises the operating system via operations on files, registries or processes. Our model does not aim to detect the malware or the attack which compromises the operating system via memory operations (Feng et al., 2014) or any hardware-based attack. For example, we do not aim to detect the malware or attack which installs/executes its malicious payload into/from the memory without operations on the file system or registry. In addition, we must assume that no malicious access trace is involved, when we construct the dependency network of system-wide benign access traces to assess the security importance of system objects.
4.2.4.
Threat model
Importance assessment with respect to integrity
The three simple structures discussed in this section inspire us to utilize various centrality metrics in network science to quantify security importance of system objects. For other structures in security dependency network, we treat them as combinations of the three simple networked structures (Milo et al., 2002). For example, Fig. 6a exhibits a 2-by-2 circle structure, which is a combination of a chain structure and a star structure. Fig. 6b shows a 2-by-2 acyclic clique structure, which is a combination of two star structures. As a results, the three centralities discussed in this section provide meaningful metrics to assess importance of system objects in a security dependency graph. More concretely, we assess the importance of system objects with respect to integrity as follows. 1. Capturing system level access events of files and registries. 2. Constructing the integrity dependency network from the recorded access behaviors. 3. Calculating the centrality metric, such as in-degree centrality, of the system objects in the integrity dependency network. 4. The centrality metric is the importance metric for system objects with respect to integrity. For importance assessment with respect to confidentiality, the above procedures also apply except that the integrity dependency network should be replaced by the confidentiality dependency network. We discuss more about this in Section 7.
5. Importance metric based malware detection Importance metrics assign security importance to system objects with respect to a particular security attribute. It provides us a novel and interesting way of discriminating malicious processes from benign ones by examining their behaviors with
5.2.
Malware detection
“No read down” (NRD) and “no write up” (NWU) have become the most fundamental policies for attack prevention with respect to the integrity of system objects (Biba, 1977; Fraser, 2000; Mao et al., 2011; Sun et al., 2008; Vijayakumar et al., 2012). Meanwhile, as suggested by previous work (MICROSOFT, 2014; Sze and Sekar, 2015), it is feasible to secure Windows system just by the NWU policy. Our importance metric serves as a proxy for quantifying the integrity of a system object. Thus, if we achieve an accurate importance metric for system objects with respect to integrity, we are able to detect malware or malicious activities according to either of two common policies as follows. Policy 5.1. For a process, • if it writes to highly important objects (NWU only); • or, if the importance of its reading objects is lower than that of the objects it writes to (both NRD and NWU); then the process is malicious. Policy 5.1 is reasonable in terms of classical Mandatory Access Control (MAC) models such as the Biba model. The activities that violate this policy are dangerous since they threaten the security requirement of a running system. Malware has a high probability to violate Policy 5.1, since malware usually appears to be unimportant and tamper with important objects to have more control of the system. In reality, benign processes may also violate the policy in a Voluntary Access Control (VAC) system. For example, a benign process may read objects whose importance in integrity is slightly lower than that of the objects it writes to. However, these patterns of violations of benign processes are usually different from those of malware. Thus, although the enforcement of Policy 5.1 may lead to false alarms in modern operating systems, it offers us insights into devising statistical classifiers to distinguish malicious processes from benign ones from the perspective of importance
54
computers & security 68 (2017) 47–68
Fig. 7 – Framework of importance metric based malware detection.
in integrity. In the next section, we present how to construct importance metric based behavioral profile for our approach to malware detection.
5.3.
Importance metric based behavioral profile
The behavioral profile of a process consists of the type of its access behaviors and the ranking position of its accessed objects under the importance metric. We first rank system objects by their importance values under the importance metric. Objects with the same importance value are at the same ranking position. And then, we count the number of unique accessed system objects at each ranking position. Formally, the behavioral profile, which is also known as feature vector, of a process i is
X i = ⎡⎣xi(read), x(i write) ⎤⎦ ,
(a) ⎤⎦ xi(a) = ⎡⎣ xi(1a), … , xij(a), … , xiR
(4)
where x(ia) represents a vector of the number of unique objects which are accessed with behavior a ∈ {read, write} at each ranking position. Each entry xij(a) is the number of objects accessed with behavior a, at a rank position j. R is the total number of ranking positions under the importance metric for file and registry objects. With such importance metric based behavioral profiles of processes, we propose an importance metric based malware detection.
5.4.
Method description
Fig. 7 illustrates the framework of our approach to malware detection. We construct file-process and registry-process security dependency networks from benign access traces in the training set. We quantify the importance of system objects as presented in Section 4. Before we train a statistical classifier, we construct importance metric based behavioral profiles for both benign and malicious processes. Finally, malware detection is performed on benign and malicious processes in the testing set with the help of the trained classifier. As we stated at the beginning of Section 5.2, if we achieve an accurate importance metric for system objects with respect
to integrity, we are able to detect malware or malicious activities according to Policy 5.1, and achieve encouraging performance by our importance metric based malware detection. Thus, the performance on malware detection provides a way of evaluating the feasibility of our approach to importance assessment in security. The importance metric based behavioral profile naturally encodes security policies of access behaviors. The importance metric connects security policies for access behaviors with statistical discriminations of benign and malicious processes. We demonstrate such connections via experiments on a real-world dataset in Section 6.
6.
Evaluations and experimental results
In this section, we first describe our data collection in Section 6.1. And then, we evaluate our importance metrics for system objects from five aspects as follows. 1) We characterize the structural properties of the security dependency networks in Section 6.2. 2) We study the feasibility of importance metrics in Section 6.3. 3) We evaluate the assessed importance of system objects with our domain knowledge in terms of their ranking positions under importance metrics in Section 6.4. 4) We explore the discrimination between benign and malicious processes with respect to the security importance in Section 6.5. 5) We evaluate the performance of our importance metric based malware detection in Section 6.6.
6.1.
Data collection
6.1.1.
Benign samples
Process Monitor (Cogswell and Russinovich, 2014) is an IRP (I/O Request Package) based monitoring tool for Windows that captures real-time file system, registry and process activities. We employed Process Monitor to trace the access events of benign processes on twelve different users’ machines which were running either on Windows XP sp3 (32 bit) or on Windows 7 (64 bit). Among those twelve users, three of them were male undergraduates who were working on their final year projects,
55
computers & security 68 (2017) 47–68
Table 1 – Basic statistics of the data collection and the security dependency networks. |N| is the number of vertices, |E| is the number of edges. OS
User
Time
Statistics of security dependency networks File-process
XP
Win 7
1 2 3 4 5 6 7 8 Subtotal 9 10 11 12 Subtotal
Logged (hours)
Total (days)
Vertices (|N|)
Edges (|E|)
Size (|N|)
Edges (|E|)
68 98 125 110 111 80 60 83 735 110 77 42 117 346
14 10 13 14 16 9 8 7 81 9 14 8 11 42
39,797 19,393 14,900 134,742 27,474 31,431 25,717 35,516 312,286 103,359 59,757 18,996 105,519 287,631
68,385 33,361 24,903 150,786 57,100 48,182 40,106 57,322 446,132 126,393 93,453 24,625 158,035 402,506
103,198 150,540 41,815 31,595 46,917 32,972 29,104 42,642 344,824 265,756 177,362 279,335 188,147 910,600
267,085 260,172 99,666 77,871 145,828 78,696 77,225 90,210 872,170 385,185 432,339 388,278 308,899 1,514,701
and nine others were graduates consisting of one female student and eight male students. Their behaviors included writing, programing, web surfing, etc. Overall, we collected 27,840 and 11,781 access traces of benign processes on Windows XP sp3 and Windows 7, respectively. We constructed file-process security dependency networks and registry-process security dependency networks from extracted access events of benign processes. Table 1 illustrates some statistics of our dataset. The large number of system objects and their dependency relationships demonstrate the difficulty in quantifying the importance of objects either manually or heuristically.
6.1.2.
Registry-process
5% new samples, and further obtained access traces of 234 malware samples after executing in our sandbox.
6.2.
Characterizing security dependency networks
We characterize the structural properties of the security dependency networks, which are constructed from observed access events of benign processes on eight Windows XP sp3 machines and four Windows 7 machines, to understand the properties of the security dependency networks. These structural statistics demonstrate the existence of system objects with different levels of importance, and the demand of assessing the security importance via networked structures.
Malicious samples
We collected two datasets of malicious executable files to evaluate our approach from two perspectives: 1) Exploring the accuracy of our approach in a large scale dataset. 2) Examining the ability of detecting new malware. For the first dataset, we collected a large number of malware samples from VxHeaven (VXHeaven, 2010). To evaluate the ability of detecting new malware, we collected a portion of executable files from the MALICA dataset (Nappa et al., 2013). We examined the first time that a sample was seen by VirusTotal (VirusTotal, 2016) and the TimeDateStamp encoded in the binary file to ensure the collected samples from the MALICA dataset were newer than those from VxHeaven. Each malicious sample was executed in a sandbox for five minutes: Windows XP sp3 running in VMWare without network connection. All access events of malware samples were traced by Process Monitor. There existed samples, which did not exhibit any access behavior because of no network connection, no stimulation, or insufficient waiting time. Suggested by previous works (Fattori et al., 2015; Lanzi et al., 2010), we examined the logs captured by Process Monitor, and removed logs of samples without exhibiting any writing behavior. Finally, we obtained 7257 malware samples (executable files), consisting of 2112 Trojan, 2563 Worms and 2582 Virus, from VxHeaven as the first dataset. For the second dataset, we randomly selected
6.2.1.
Degree distribution of dependency networks
Fig. 8 illustrates in-degree and out-degree distributions of fileprocess and registry-process security dependency networks in the doubly logarithmic scales. Fig. 8a and Fig. 8c show the inand out-degree distribution of file objects and registry objects in the file-process and registry-process dependency networks constructed from access events of Windows XP and Windows 7 systems. Fig. 8b and Fig. 8d show the in- and outdegree of process objects in the two dependency networks of Windows XP and Windows 7 systems. Visually, the in-degree of file objects and registry objects, in- and out-degree of process objects obey power law distributions, P (k) ∼ k− γ , appearing as straight lines in log-log scale. Furthermore, we leverage poweRlaw (Gillespie, 2015) to conduct goodness of fit tests to examine the power-law distributions as suggested by Clauset et al. (2009). Table 2 shows the results of power law fits of degree distributions and the corresponding p-value. As Clauset et al. (2009) suggested, if the p-value is greater than 0.1 the power law is a plausible hypothesis for the data. The statistically significant p-values are highlighted in Table 2. The power law distributed in- and out-degrees indicate the security dependency network is scale-free. Meanwhile, the observed power law distributions can be interpreted by the preferential attachment process, which indicates “the rich get
56
computers & security 68 (2017) 47–68
Fig. 8 – In-deg(ree) and out-deg(ree) distribution in file-proc(ess) and reg(istry)-proc(ess) dependency network in terms of complementary cumulative distribution functions (CCDF).
richer” (Newman, 2010). Taking a look at the blue lines of the in-degree distribution of file objects in Fig. 8a, the preferential attachment suggests a file object, which has been read by many process objects, is more likely to be read by subsequent
process objects in future. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) This is different from a random network with a Poisson degree distribution (Newman, 2010),
Table 2 – Power law fits of degree distributions and corresponding p-value (statistically significant values are denoted in bold). OS
XP
Dependency network
File-process Registry-process
Win 7
File-process Registry-process
Object
File Process Registry Process File Process Registry Process
In-degree
Out-degree
xˆ min
αˆ
p
xˆ min
αˆ
p
26.3 34.5 32.0 87.7 17.4 48.2 35.1 165.1
1.98 1.55 2.52 1.77 2.34 1.57 3.26 1.79
0.1 0.64 0.08 0.28 0.1 0.55 0.12 0.26
11.9 61.8 87.6 547.5 14.1 261.2 40.3 505.2
2.04 1.80 3.12 1.89 3.62 1.84 7.12 1.68
0.01 0.42 0.67 0.56 0.29 0.97 0.0 0.0
computers & security 68 (2017) 47–68
which suggests each file has the same probability to be read by subsequent process objects. Thus, if an attacker compromises the integrity of the file, which has been read by many process objects, the compromised file will be read by subsequent processes with a high probability. As a result, the compromise on such file will have more influence on the integrity of other system objects than a file which has been read by a few process objects. According to our definition, this suggests the importance of a file object can be quantified by counting the number of process objects that read it, which is the in-degree of the file in the integrity dependency network. Similar observations can be found when analyzing the importance of a registry object or a process object under the preferential attachment of their in-degrees. Hence, the preferential attachment process suggests assessing the importance of system objects by counting their in-degrees.
57
6.2.2. Diameter and small-world effect of dependency networks Fig. 9 illustrates diameter and average shortest path of fileprocess and registry-process security dependency networks as the number of vertices increases. Diameter is the length of the longest shortest path between any pair of vertices in the network. A network is a small-world network if its diameter increases logarithmically with the number of its vertices (Newman, 2010). We observe logarithmic increase of diameter with the number of vertices in both Fig. 9a,c. Meanwhile, we observe a more obvious logarithmic increase of the average length of shortest path with the number of vertices in Fig. 9b,d. In fileprocess security dependency networks, two files are reachable to each other with at most 12 security dependency relationships, and 4 security dependency relationships on average, in
Fig. 9 – Diameter and average (avg.) length of shortest path of security dependency network as number of vertices increasing.
58
computers & security 68 (2017) 47–68
Fig. 10 – Distribution of the fraction of reachable objects in the dependency network. For example, a and c is reachable to b and d, respectively.
both Windows XP sp3 and Windows 7 systems. In registryprocess security dependency networks, two registries are reachable to each other with at most 8 dependency relationships in Windows XP sp3 systems, at most 12 dependency relationship in Windows 7 systems, and 4 dependency relationships on average in both versions of systems. These observations suggest the small-world effect in both file-process and registry-process security dependency networks. Small-world effect indicates that most pairs of vertices are connected by a short path. It implies the fast spreading of impact of compromise in real operating systems. Meanwhile, the small-world effect indicates the existence of “hubs” which are objects connecting many other objects in operating systems. The compromise of such hubs has much more influence than the compromise of others. It further indicates different levels of security importance of different system objects.
6.3.
Feasibility of importance metric
We demonstrate the feasibility of our importance metrics in security from two perspectives: 1) The local structures which form a basis for our importance metrics are abundant and universal in a security dependency network. 2) The calculation of importance metrics converges quickly and is practical in real systems.
6.3.1.
Local structure of dependency network
We explore local structures of the dependency network to examine the feasibility of the centrality metric based
importance assessment for system objects. We are interested in three types of local structures in the security dependency network, which are in-star, out-star, and chain structures as shown in Figs. 2–4. We demonstrate that these structures are universal and in large amount in the security dependency networks, which constitutes a basis for our importance assessments. The in-degree distribution shows the statistics of in-star structures. As shown in Fig. 8a, in Windows XP sp3 systems, we observe about 2% file objects, 10% registry objects, and 75% process objects have in-star structures with at least two indegrees. Meanwhile, in Fig. 8b, we observe 0.4% file objects, 3.5% registry objects, and 100% process objects have out-star structures with at least two out-degrees. Similar statistics are observed in Windows 7 systems as shown in Fig. 8c,d. The power-law distributed degrees imply many in-star and outstar structures with large in-degrees and out-degrees in the security dependency network. A chain structure indicates reachable objects along a path in the security dependency network. We count the fraction of unique reachable objects along a path of length 3 as shown in Fig. 10, which provides statistics for chain structures in the dependency network. We observe that, in Windows XP systems, along the read-write-read (RWR) chain structure, 155,878 out of 311,756 (50%) files are reachable to 477 out of 530 (90%) processes, and 172,154 out of 344,308 (50%) registries are reachable to 371 out of 530 (70%) processes. Meanwhile, along the writeread-write (WRW) chain structure, 186 out of 530 (35%) processes are reachable to 124,702 out of 311,756 (40%) files, 239 out of 530 (45%) processes are reachable to 206,585 out of 344,308 (60%)
computers & security 68 (2017) 47–68
59
Fig. 11 – Convergence of importance values over time. It shows the Spearman rank correlation coefficient between the importance ranking on the last day and that on the previous days.
registries. The large fraction of reachable objects indicates the large number of chain structures of length 3 in the security dependency network. Furthermore, as shown in Fig. 9b, pairs of system objects are reachable to each other along a path of length 4 on average, which implies many chain structures of length more than 3 in the security dependency network. The large number of in-star, out-star and chain structures suggests the feasibility of the importance assessment by centralities.
6.3.2.
Convergence of assessed importance
The dynamic and variety of the access behaviors lead to the evolution of the security dependency network in a real operating system. It is impractical if the quantified importance values of objects vary greatly as the evolution of the security dependency network. Hence, we explore the converge of the importance value of objects under the importance metrics in terms of correlations between importance rankings, as shown in Fig. 11. More specifically, we compare the ranking position of a file or registry object on the last day, with its ranking positions on previous days, in terms of Spearman rank correlation coefficients. In Fig. 11, the x-axis shows the observation time in days. The left y-axis presents the Spearman rank correlation coefficient between the ranking position of objects on the last day and those on a particular day. The right y-axis indicates the number of edges in the security dependency network. The black line shows the number of edges in the dependency network as time proceeds. We observe that, although the number of edges increases with time, the ranking positions of both file and registry objects exhibit strong consistencies for all three importance metrics. It indicates that we can achieve stable importance assessment results for objects in a short time, and suggests the feasibility of the importance metrics to be calculated in practice.
6.4.
Importance assessment: case studies
Given such a huge number of system objects, it is not easy to evaluate the assessed importance of system objects one by one.
In this subsection, we evaluate the assessed importance of system objects with our domain knowledge from two aspects: 1) The most important system objects. 2) Importance of system objects by categories.
6.4.1.
Most important system objects
First, we examine the most important system objects and their roles in security. Table 3 illustrates the eight most important process, file and registry objects under the importance metrics. In Table 3, columns 2–4 illustrate the eight most important process objects under the importance metrics. Several processes of web browsers are among the top ranked by the in-degree and the authority metrics. This is because indegree and authority metrics focus on the direct neighbors of an object in the security dependency network, while web browsers usually write many unimportant temporary files. On the other hand, the PageRank metric considers both direct and indirect neighbors, and achieves more reasonable results. System processes, such as system, explorer, and svchost, and security processes, such as ekrn, 360tray, and 360safe, are among the top ranked processes by the PageRank metric. These are important processes in terms of security indeed, and this result suggests the PageRank metric based importance assessment is more reasonable for process objects than the other two metrics. The eight most important file objects are the same under three metrics, as shown in column 5. All listed files play critical roles in Windows operating systems. For example, ntdll.dll contains NT kernel function entries. kernel32.dll is the most important Microsoft Windows Kernel handling memory management, I/O operations and interrupts. secur32.dll contains Windows Security function entries. advapi32.dll provides advanced API services on the security and registry. rpcrt4.dll is the Remote Procedure Call (RPC) API, used by Windows applications for network and Internet communication. This result suggests the in-degrees and out-degrees are important for importance assessment of file objects, while the PageRank metric achieves consistent results with the other two.
60
computers & security 68 (2017) 47–68
Table 3 – Eight most important process, file, and registry objects. Rank
Process In-degree
Authority
PageRank
File
Registry
In-degree/ Authority/ PageRank
In-degree/Authority/PageRank
1
theworld.exe
theworld.exe
ekrn.exe
ntdll.dll
2
rundll32.exe
rundll32.exe
system
kernel32.dll
3
iexplore.exe
360chrome.exe
360tray.exe
secur32.dll
4
chrome.exe
chrome.exe
360safe.exe
advapi32.dll
5 6
360chrome.exe sogouexplorer.exe
iexplore.exe sogouexplorer.exe
explorer.exe svchost.exe
rpcrt4.dll lpk.dll
7 8
cleanmgr.exe msiexec.exe
sysocmgr.exe storagetool.exe
rundll32.exe chrome.exe
user32.dll gdi32.dll
hklm/system/currentcontrolset/control/terminal server/ tsappcompat hklm/software/microsoft/windows nt/currentversion/ windows/appinit_dlls hklm/system/currentcontrolset/control/session manager/ criticalsectiontimeout hklm/software/policies/microsoft/windows/safer/ codeidentifiers/transparentenabled hklm/system/setup/systemsetupinprogress hklm/software/microsoft/windows nt/currentversion/ languagepack hkcu/software/classes hklm/software/microsoft/ctf/systemshared/cuas
All listed file objects are in the directory windows\system32.
The eight most important registry objects are the same under all three metrics, as shown in the last column. All listed registries play critical roles in Windows operating systems. For example, appinit_dlls specifies DLLs that will be loaded by each application, malware usually sets this key to inject malicious DLL into every process when the process starts, e.g., Trojan.Downloader.Conhook.AK. The registry value of transparentenabled indicates whether Software Restriction Policies (SRP) are turned on or not to prevent unauthorized users. Malicious code usually fools the SRP code by returning an error value when query this key to subvert the restrictions. These observations suggest the feasibility of the three metrics for security importance assessments of registry objects.
6.4.2.
Security importance of system objects by categories
It is difficult to separate registry objects into categories according to either their paths or functionalities. Thus, we only examine file and process objects by categories.
6.4.2.1. Process types. To examine the importance metrics for process objects in detail, we categorize processes according to their functionalities. We manually separate processes into seven categories: system processes, security processes, web browser processes, networking processes, office processes, update processes, and general service processes. Fig. 12 illustrates statistics of importance values of processes in different categories. More accurately, we present maximum, mean, median and minimum importance values of all processes in each category. To save space, we focus on the results under the PageRank metric. In Fig. 12, we have observations as follows. 1) The maximum and minimum importance values of system processes are the highest compared with other categories. This is consistent with the critical functionalities of the most important system processes. 2) The maximum and minimum importance values of security processes are also high. This is consistent with their functionalities for protecting the system. 3) Web browser processes have the highest mean and median values of importance. This is also understandable since they are among the most frequently used processes which make many system and user’s
files dependent on web browsers. 4) Office processes, such as winword.exe, outlook.exe and photoshop.exe, access users’ personal files rather than critical resources. Hence, their importance values are moderate. 5) General service processes provide auxiliary services for the system, such as lingoes.exe, kmplayer.exe, fplogonserv.exe, and so on. Their importance values are relatively low. The consistency between the quantified importance values of process objects and their functionalities supports the feasibility and correctness of the PageRank metric for process objects.
6.4.2.2. File types. We separate file objects into different types according to their extensions, and analyze statistics of their importance values in each type. The file types we are interested in are, dll, ini, sys, exe, bat, log, jar, xml, tmp, png, gif, pdf, doc, zip, rar, bak, ppt, and cpp. These are common file types, such
Fig. 12 – Importance values of processes in each category under PageRank metric. Sys stands for system processes, WB stands for web browsers, Net stands for other networking applications.
computers & security 68 (2017) 47–68
6.5.
61
Behavioral discrimination under importance metrics
The importance metric based behavioral profile, proposed in Section 5, encodes security policies for access behaviors, and we are able to discriminate malicious processes from benign ones with respect to security policies. In this subsection, we evaluate the feasibility of importance metric based behavioral profile according to common security policies, and examine their behavioral discriminations via statistical hypothesis testing in order to provide evidence of malware detection.
6.5.1.
Security policies with respect to integrity
Accurate importance metrics for system objects with respect to integrity should be discriminative for benign and malicious processes under Policy 5.1.
Fig. 13 – Importance values of files in each file type under PageRank metric.
that we are able to compare the quantified importance values of these files with their roles. To save space, we focus on the results under the PageRank metric. We observe the importance values of file objects under importance metrics are highly right skew. The skewness is because most of files are program specific which are not important from the system-wide perspective. To avoid the influence of the skewness, we focus on the 50% most important file objects within each file type. Fig. 13 illustrates the maximum, mean, median and minimum statistics of importance values, under the PageRank metric, of files in each common file type. Taking dll as an example, we select the 50% most important dll files under the PageRank metric, and present the maximum, mean, median and minimum importance values of these selected dll files. As shown in Fig. 13, the importance values differ in file types. The most important file type is dll considering the mean value. This is consistent with the roles of the dll files to support the running of processes. Among all dll files, as we shown in the previous subsection, dll files in the windows/system32 directory own pretty high importance values. Some program specific dll files are in the middle, such as mso.dll(office), dwdcw20.dll(office), loadwdui.dll(security), and so on. We find the least important dll files are files related to the non-important programs, such as qzonemusic.dll(entertainment), adobepim.dll, and so on. Sys and ini files correspond to system services, or support the running of processes. Meanwhile, exe and bat files are usually the executable files of processes. They also deserve high importance values. Personal document files, such as pdf, doc, ppt, zip, and xml, are with similar and moderate importance values as a result of their nonessential roles for a system-wide perspective, although they may be important for user. Furthermore, picture files (gif and png), backup files (bak) and temporary files (tmp) have low importance values. The consistency between the quantified importance values of file objects and their roles supports the feasibility and correctness of the PageRank metric for file objects.
6.5.1.1. Highest ranking position of writing object. We investigate the first policy, NWU, in Policy 5.1 by examining the the highest ranking position among all the objects written by a process. Under the first policy, strictly speaking, a benign process should not write to objects more important than itself. We assume each process is the least important object among all system objects, and treat the highest ranking position among all writing objects of a process as an indicator of the violation of this policy for the process. Box plots in Fig. 14 exhibit the highest ranking positions among writing objects of benign or malicious processes. Each box, from bottom to top, presents 5, 25, 50, 75, 95 percentiles. Fig. 14a–c illustrates the highest ranking positions of writing file objects, while Fig. 14d–f illustrates the highest ranking positions of writing registry objects. As shown in Fig. 14, malicious processes are more likely to write file or registry objects at higher ranking positions than benign processes do, under any of three importance metrics. For example, as shown in Fig. 14c, 75% malicious processes write files at higher ranking positions than 95% benign processes. Meanwhile, more discriminations are observed in file objects than in registry objects. The discriminations between benign and malicious processes under the indicator of the violation of the security policy, no write up, suggest the feasibility of our importance metrics, and their abilities to detect malware. 6.5.1.2. No read down and no write up. Furthermore, we examine the second policy (NRD and NWU) in Policy 5.1 by investigating the violation among all writing objects. A benign process should obey NRD and NWU simultaneously, and a violation happens when the ranking position of a writing object is higher than the lowest ranking position among all of reading objects. Box plots in Fig. 15 exhibit the fraction of violated writing objects among all writing behaviors of benign or malicious processes. Fig. 15a–c illustrates the fraction of violating writing behaviors on file objects, while Fig. 15d–f illustrates the fraction of violating writing behaviors on registry objects. In Fig. 15a–c, we observe differences between benign and malicious processes in the fraction of violations under the indegree metric and the PageRank metric for file objects. Meanwhile, the PageRank metric, as shown in Fig. 15c, exhibits more discriminations than the other two metrics. At least 75% malicious processes exhibit higher fractions of violations than 75% benign processes. However, on registry objects, as shown in Fig. 15d–f, benign processes are surprisingly more
62
computers & security 68 (2017) 47–68
Fig. 14 – Highest ranking positions among all writing objects under the importance metrics. Bottom indicates higher ranking.
likely to violate the second policy in Policy 5.1, and there is no discrimination between benign and malicious processes. There are two possible reasons: 1) All of these three importance metrics are failed to assess the importance of the registry
objects with respect to integrity. 2) Accessing the registry objects does not obey NRD and NWU simultaneously. We observe discriminations between benign and malicious processes under two common security policies with
computers & security 68 (2017) 47–68
63
Fig. 15 – Violations of Policy 5.1 under the importance metrics.
respect to integrity, as shown in Fig. 14 and Fig. 15. These observations demonstrate the feasibility of our importance metrics for assessing importance of system objects with respect to integrity, and their abilities of malware detection with the help of appropriate security policies. However, simple security policies
such as Policy 5.1 are so rigid that introduce many false positives in practice. Next, we explore behavioral discriminations via statistical analysis under importance metrics to see if more fine-grained policies are available or not with the help of statistical classifiers.
64
computers & security 68 (2017) 47–68
Fig. 16 – p-values of KS tests on P ( X r L = benign) and P ( X r L = malicious) under PageRank.
6.5.2.
Statistical discrimination
The importance metric based behavior profile defined in Section 5 allows us to explore the behavioral discriminations of processes via statistical hypothesis testing, such as Kolmogorov– Smirnov (KS) tests. Regarding each ranking position in the behavior profile as a distinguishing feature for malicious behavior, we conduct KS tests for objects at each ranking position, and examine their discrimination in terms of p-values. More specifically, we quantify the significance of the distance between two empirical distributions for each ranking position r, P ( Xr L = benign ) and P ( Xr L = malicious), where Xr is a random variable indicating the number of accessing objects at ranking position r. Fig. 16 illustrates p-values of KS tests for each ranking position r, where a lower p-value indicates more statistical discriminations between benign and malicious processes in the access behaviors on objects at the corresponding ranking position. In Fig. 16, we observe significant discriminations at many ranking positions under our importance metrics. This demonstrates the statistical discriminations between benign and malicious processes in the importance metric based behavioral profiles. Given such a large number of discriminative ranking positions, we are able to leverage statistical classifiers to automatically learn policies for malware detection as shown in Section 5.
6.6.
Importance metric based malware detection
After exploring the behavioral discriminations of benign and malicious processes, we evaluate the importance metric based malware detection.
6.6.1.
Experimental settings
According to Policy 5.1 and our observations in Section 6.5, we examine the capability of our malware detection model with three types of feature vectors as follows. 1) S1: Only “write” on file objects. 2) S2: Both “read” and “write” on file objects. 3) S3: Both “read” and “write” on both file and registry objects.
Each experiment consisted of eight sub-experiments. More precisely, in each sub-experiment, we selected one Windows XP user and treated the access traces of her/him as benign testing instances. The access traces of the rest seven Windows XP users were treated as the benign training instances. With respect to the malicious instances, we randomly selected 80% access traces of malicious processes as the malicious training instances, and treated the remaining malicious access traces as the malicious testing instances. We employed three classifiers, kNN, logistic regression, and random forests to explore the capability of our importance metric based malware detection. We ran ten-fold cross validations on the benign and malicious training sets to get the best hyper-parameters of the classifiers, and evaluated the performance on the testing set. We repeated each sub-experiment ten times to account for the randomness of selected malicious training instances. Furthermore, to evaluate the ability of our approach to detect new malware samples, we trained the classifier on the first dataset from VxHeaven with the ten-fold cross validation and evaluated on the second dataset from MALICA.
6.6.2.
Comparison on importance metrics and classifiers
Fig. 17 illustrates AUCs of eight sub-experiments with different settings in terms of box-plots. The left three boxes in blue show the performance with only examining “write” on files (S1) under different importance metrics, the middle three boxes in red show that with examining both “read” and “write” on files (S2), and the right three boxes in green show that with examining both “read” and “write” on files and registries (S3). We observe that among malware detectors with S1 and the kNN classifier, the PageRank metric achieves the best performance, and the authority metric achieves the worst performance. Among malware detectors with S1 and either the logistic regression or the random forests classifier, the PageRank metric achieves the best performance, and the in-degree metric achieves the worst performance. Meanwhile, improvements are always observed with S2 or S3 when compared with S1. These observations are confirmed with Wilcoxon signed-rank tests at level 0.05.
65
computers & security 68 (2017) 47–68
effectiveness of the importance metric based malware detection. Furthermore, these results imply the correctness of the importance assessment.
6.6.4.
Fig. 17 – AUCs of eight sub-experiments under different importance metrics, classifiers and features.
Among all malware detectors with S2, there is no significant difference in the performance, except that the in-degree metric achieves worse performance than the authority metric and the PageRank metric with the logistic regression classifier. And, there is no significant difference among all detectors with S3. Furthermore, we observe the detector with the random forests classifier achieves statistically better performance than the kNN classifier and the logistic regression classifier. However, involvement of registry objects does not significantly improve the performance on the malware detection.
6.6.3.
Detailed performance
To better understand the performance of the malware detector, Table 4 illustrates the performance of malware detectors with random forests under the PageRank metric in detail, in terms of true positive rates at specific false positive rates in 8 sub-experiments. The experimental settings are presented in Section 6.6.1. We refer to “UX” as a sub-experiment, where “X” indicates the id of the selected testing user. Although no significant improvement after involving registry objects (S3), it still achieves the best performance on average. The encouraging results, such as 83.11% TPR at 0 FPR, and 93.92% TPF at 0.1% FPR, suggest the feasibility of the importance metric and the
Comparisons with other techniques
To compare our model with existing dynamic analysis based malware detection, we employed two baselines: 1) A free online sandbox, Comodo instant malware analysis (CIMA) (Comodo, 2013). 2) System call-based malware detector taking advantage of a statistical classifier (Canali et al., 2012). CIMA provided dynamic analysis for executable files and reported detected behaviors on files, registries, processes, network resources, etc. Meanwhile, according to its own defined maliciousness of detected behaviors, it reported a final maliciousness level of the sample, e.g., undetected, suspicious, suspicious + , suspicious++. We speculate that, CIMA treated an executable file as a benign one if it did not report it as suspicious or more than suspicious. Submitting our malware samples to this online service, the true positive rate is 73.24% (5,315/ 7,257). For the false positives, we collected all applications appeared in our benign dataset. Eventually, CIMA achieves 5.37% (24/447) false positive rate. As a comparison, our model achieves 93.92% TPR at 0.1% FPR. Meanwhile, we employed a baseline model of bag of operations/actions on files, registries and processes with arguments. We evaluated the baseline model on our dataset with random forests, since it achieved better performance than the other two statistical classifiers. Fig. 18 illustrates the AUCs of the baseline model and our approach (S3, PageRank, random forests) in each sub-experiment. Since we repeat each subexperiment ten times, each box shows the distribution of the AUCs of ten repetitions of each sub-experiment. As we observe in Fig. 18, our model significantly outperforms the baseline model of bag of operations.
6.6.5.
Detection of new malware
Furthermore, we trained random forests under the PageRank metric and tuned the hyper-parameters via the ten-fold cross validation on our first dataset which was from VxHeaven. Our approach is able to detect 225 out of 234 (96.15%) malware samples in the second dataset, which are new samples from MALICA dataset. This result demonstrates the ability of our approach to detect new malware samples.
Table 4 – Performance of malware detectors with different features under PageRank metric with random forests. FPR
S1
S2
S3
0% 0.1% 0.5% 1.0% 0% 0.1% 0.5% 1.0% 0% 0.1% 0.5% 1.0%
True positive rate
Average TPR
U1
U2
U3
U4
U5
U6
U7
U8
0% 94.74% 96.0% 97.07% 85.50% 99.58% 99.79% 99.82% 99.5% 100% 100% 100%
0% 46.39% 96.24% 96.40% 0% 70.0% 99.39% 99.86% 82.35% 90.02% 99.55% 99.80%
0% 91.82% 95.34% 95.86% 87.92% 98.39% 99.66% 99.73% 97.59% 99.22% 99.81% 100%
0% 86.34% 89.16% 95.47% 86.14% 98.06% 99.26% 99.80% 98.23% 99.80% 100% 100%
0% 18.70% 93.51% 93.80% 99.18% 99.72% 99.73% 99.79% 88.11% 92.88% 99.55% 99.76%
93.53% 95.63% 95.98% 96.06% 99.72% 99.79% 100% 100% 99.42% 99.51% 99.63% 99.56%
0% 86.46% 90.01% 92.32% 0% 8.21% 41.03% 82.05% 0% 70.21% 95.33% 99.08%
0% 0.94% 4.69% 9.38% 96.15% 97.17% 99.29% 99.38% 99.71% 99.70% 99.72% 99.80%
11.69% 65.06% 82.62% 84.55% 69.33% 83.87% 92.27% 97.55% 83.11% 93.92% 99.20% 99.75%
66
computers & security 68 (2017) 47–68
7.2.
Fig. 18 – AUCs of the baseline model and our approach in repetitions of each sub-experiment.
Taking advantage of security dependency relationships, we propose importance metrics to assess the importance of objects from a defensive and system-wide perspective, and evaluate importance metrics in two Windows systems, XP sp3 and Windows 7. Our approach can be applied to other systems as well, such as Windows 10, Linux, Android, iOS, etc, since the security dependency relationships defined in this paper are not system dependent. Moreover, administrators can define their own relationships for pairs of system objects according to their security requirements, such as confidentiality, construct network to make connections among system-wide system objects, and design networked structures based importance metric to assess the importance of objects. Furthermore, the derived importance metric provides a way of describing behavioral profiles for malware detection and prevention in the corresponding system.
7.3. The performance analysis of our malware detection method in this section further demonstrates the feasibility of our networked approach to security importance assessment.
7.
Discussion
7.1.
Importance in confidentiality
Confidentiality protection aims to make sure that the data are never disclosed to unauthorized individuals. Confidentiality protection is usually applied to military or government systems. Although we focus on assessing the importance in integrity in this paper, centrality metrics can be easily extended to assess the importance of objects with respect to confidentiality. Similar to the importance in integrity, we refer to the importance of a system object in confidentiality as the amount of influence it could have on the confidentiality of others, if its confidentiality is disclosed. Considering the basic example as shown in Fig. 1a, if p is disclosed, the information it reads from f will be threatened to be disclosed. Thus, f will be threatened to be disclosed. On the other hand, for a writing behavior as shown in Fig. 1b, if f is disclosed, the information written by p will be threatened to be disclosed, and p will be threatened to be disclosed. The influence of the disclosure of the confidentiality propagates along the path in the security dependency network as well. However, the propagation of the compromise of the confidentiality is contrary to the propagation of the compromise of the integrity. Thus, if we construct another network, which has the same vertices, and the same edges but with reversed directions as in the integrity dependency network, we have similar centrality metrics for the importance assessment for objects with respect to confidentiality. More accurately, the indegree centrality, the authority centrality, and the PageRank centrality on the “reversed” network of the integrity dependency network provide the importance metrics for objects with respect to confidentiality.
Generalization to other systems
Prioritizing protection
Effective protection for real operating systems greatly benefits from the accurate protection for important system objects (Fattori et al., 2015; Lanzi et al., 2010). Our importance metrics provide a basis for prioritizing efforts to devise fine-grained security policies for system objects. The conference paper of this work demonstrated that, compared with fully protection model, protecting a fraction of most important system objects achieved similar, and sometimes even better, performance in malware detection (Mao et al., 2014). Thus, we suggest administrators pay more attention to more important system objects when devising fine-grained security policies.
7.4.
Limitations
The security importance analysis relies on the observation of access traces of system-wide benign processes. Meanwhile, the security importance of system objects in one operating system may be different from those in other operating systems. Thus, if the attacker observes the access traces of system-wide benign processes on a targeted operating system, the attacker will be able to acquire the security importance of system objects in the operating system, and craft specific malware to evade the detection. However, acquiring access information at the system level will not be unnoticeable. This makes the malware unstealthy and easy to be detected. Meanwhile, this paper considers a malware or an attack which compromises the operating system via operations on files, registries or processes. Thus, our method will fail if a malware or an attack compromises the operating system via memory operations or the vulnerability of hardware. However, with the help of cutting-edge technologies in dynamic data flow tracking (Jee et al., 2012; Kemerlis et al., 2012), our security importance analysis can be further extended to other finegrained system objects, such as file bytes, kernel modules, memory, etc., by investigating the interactions between them. Since it is not the goal of this paper, we leave it as future work.
computers & security 68 (2017) 47–68
Moreover, the security dependency relationship defined in this paper takes into account the access events. There exists security dependency relationship among objects in other forms, such as function calls. Or the security dependency relationship can be only observed at the very early stage of system startup. By investigating the dependency relationship involved in various types of events or instructions, and deploying the monitor as early as possible, the security dependency relationship can be extended, and more objects can be involved in future.
8.
Conclusion
In this paper, we build a security dependency network from access behaviors to quantify the security importance of system objects from a system-wide perspective. The security dependency network exhibits power-law degree distributions and small-world effect properties. Meanwhile, exploring networked structures in the dependency network provides us insights into the importance of system objects in security. We take advantage of centrality metrics to assess the importance of system objects with respect to integrity, and evaluate the importance metrics of system objects from various perspectives. Moreover, we propose an importance metric based behavioral model for malware detection. Experimental results on a real-world dataset demonstrate the feasibility of the importance metrics and the efficacy of our malware detection method leveraging importance metrics.
Acknowledgments The authors thank VirusTotal for providing online services. This research is supported by NFSC (61175039, 61221063, 61375040).
REFERENCES
Agrawal G. Simultaneous demand-driven data-flow and call graph analysis. In: Proceedings of IEEE international conference on software maintenance, 1999 (ICSM’99). IEEE; 1999. pp. 453–462. Apap F, Honig A, Hershkop S, Eskin E, Stolfo S. Detecting malicious software by monitoring anomalous windows registry accesses. In: Proceedings of the 5th international conference on recent advances in intrusion detection (RAID’02). Berlin, Heidelberg: Springer-Verlag; 2002. pp. 36–53. ISBN 3-540-00020-8. Bell DE, LaPadula LJ. Secure computer systems: mathematical foundations. ESD-TR 73-278, MITRE Corp., 1973. Bhatkar S, Chaturvedi A, Sekar R. Dataflow anomaly detection. In: IEEE symposium on security and privacy (S&P), pp. 48–62; 2006. ISSN 1081-6011. http://doi.ieeecomputersociety.org/ 10.1109/SP.2006.12. Biba. Integrity considerations for secure computer systems. ESDTR 76-372, MITRE Corp., 1977. Borgonovo E. A new uncertainty importance measure. Reliab Eng Syst Saf 2007;92(6):771–84. Brin S, Page L. The anatomy of a large-scale hypertextual web search engine. Comput Networks ISDN 1998;30(1–7):107–17. doi:10.1016/S0169-7552(98)00110-X. ISSN 0169-7552.
67
Canali D, Lanzi A, Balzarotti D, Kruegel C, Christodorescu M, Kirda E. A quantitative study of accuracy in system call-based malware detection. In: Proceedings of the 2012 international symposium on software testing and analysis. ACM; 2012. pp. 122–132. Clauset A, Shalizi CR, Newman MEJ. Power-law distributions in empirical data. SIAM Rev 2009;51(4):661–703. doi:10.1137/ 070710111. http://dx.doi.org/10.1137/070710111. Cogswell B, Russinovich M. Process monitor. 2014. Available at: http://technet.microsoft.com/en-us/sysinternals/bb896645. [Accessed 10 February 2014]. Comodo. Comodo Instant Malware Analysis. 2013. Available at: http://camas.comodo.com. [Accessed 20 December 2013]. Concas G, Marchesi M, Pinna S, Serra N. Power-laws in a large object-oriented software system. IEEE T Softw Eng 2007;33:687–708. http://doi.ieeecomputersociety.org/10.1109/ TSE.2007.1019. ISSN 0098-5589. Fattori A, Lanzi A, Balzarotti D, Kirda E. Hypervisor-based malware protection with accessminer. Comput Secur 2015;52:33–50. Feng Q, Prakash A, Yin H, Lin Z. Mace: high-coverage and robust memory analysis for commodity operating systems. In: Proceedings of the 30th annual computer security applications conference. ACM; 2014. pp. 196–205. Forrest S, Hofmeyr SA, Somayaji A, Longstaff TA. A sense of self for unix processes. In: IEEE symposium on security and privacy (S&P), pp. 120–128; 1996. Fraser T. Lomac: Low water-mark integrity protection for cots environments. In: IEEE symposium on security and privacy (S&P), pp. 230–245; 2000. Fredrikson M, Jha S, Christodorescu M, Sailer R, Yan X. Synthesizing near-optimal malware specifications from suspicious behaviors. In: IEEE symposium on security and privacy (S&P), pp. 45–60; 2010. ISSN 1081-6011. http:// doi.ieeecomputersociety.org/10.1109/SP.2010.11. Gillespie CS. Fitting heavy tailed distributions: the poweRlaw package. J Stat Softw 2015;64(2):1–16. http://www.jstatsoft.org/ v64/i02/. Hatton L. Power-law distributions of component size in general software systems. IEEE T Softw Eng 2009;35(4):566–72. Heller KA, Svore KM, Keromytis AD, Stolfo SJ. One class support vector machines for detecting anomalous windows registry accesses. In: Proc. of the workshop on data mining for computer security; 2003. Jang J-W, Woo J, Yun J, Kim HK. Mal-netminer: malware classification based on social network analysis of call graph. In: Proceedings of the companion publication of the 23rd international conference on world wide web companion, pp. 731–734; 2014. Jee K, Portokalidis G, Kemerlis VP, Ghosh S, August DI, Keromytis AD. A general approach for efficiently accelerating softwarebased dynamic data flow tracking on commodity hardware. In: NDSS; 2012. Kemerlis VP, Portokalidis G, Jee K, Keromytis AD. Libdft: Practical dynamic data flow tracking for commodity systems. In: Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on virtual execution environments (VEE ’12). New York, NY, USA: ACM; 2012. pp. 121–132. ISBN 978-1-4503-1176-2. doi:10.1145/ 2151024.2151042. http://doi.acm.org/10.1145/2151024.2151042. King ST, Chen PM. Backtracking intrusions. ACM Trans Comput Syst 2005;23:51–76. http://doi.acm.org/10.1145/1047915 .1047918. ISSN 0734-2071. Kleinberg JM. Authoritative sources in a hyperlinked environment. J ACM 1999;46:604–32. http://doi.acm.org/ 10.1145/324133.324140. ISSN 0004-5411. Klemm K, Eguíluz VM, San Miguel M. Scaling in the structure of directory trees in a computer cluster. Phys Rev Lett 2005;95(12):128701.
68
computers & security 68 (2017) 47–68
Klemm K, Eguluz VM, San Miguel M. Analysis of attachment models for directory and file trees. Physica D 2006;224:149–55. ISSN 0167-2789. Kwon BJ, Mondal J, Jang J, Bilge L, Dumitras T. The dropper effect: insights into malware distribution with downloader graph analytics. In: Proceedings of the 22nd ACM SIGSAC conference on computer and communications security. ACM; 2015. pp. 1118–1129. Lanzi A, Balzarotti D, Kruegel C, Christodorescu M, Kirda E. Accessminer: using system-centric models for malware protection. In: Proceedings of the 17th ACM conference on computer and communications security (CCS). ACM; 2010. pp. 399–412. ISBN 978-1-4503-0245-6. http://doi.acm.org/10.1145/ 1866307.1866353. Mao W, Cai Z, Guan X, Towsley D. Centrality metrics of importance in access behaviors and malware detections. In: Proceedings of the 30th annual computer security applications conference (ACSAC ’14). ACM; 2014. Mao Z, Li N, Chen H, Jiang X. Combining discretionary policy with mandatory information flow in operating systems. ACM T Inform Syst Sec 2011;14(3):24. Martignoni L, Stinson E, Fredrikson M, Jha S, Mitchell J. A layered architecture for detecting malicious behaviors. Recent advances in intrusion detection (RAID), vol. 5230. 2008. pp. 78–97. Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U. Network motifs: simple building blocks of complex networks. Science 2002;298(5594):824–7. MICROSOFT. What is the windows integrity mechanism? 2014. Available at: https://msdn.microsoft.com/en-us/library/ bb625957.aspx. [Accessed 15 April 2014]. Myers CR. Software systems as complex networks: structure, function, and evolvability of software collaboration graphs. Phys Rev E 2003;68(4):046116. Nappa A, Zubair Rafique M, Caballero J. Driving in the cloud: an analysis of drive-by download operations and abuse reporting. Detection of intrusions and malware, and vulnerability assessment. Springer; 2013. pp. 1–20. Newman M. Networks: an introduction. Oxford: Oxford University Press; 2010. Sahinoglu M. Security meter: a practical decision-tree model to quantify risk. IEEE Secur Priv 2005;3(3):18–24. Sun W, Sekar R, Liang Z, Venkatakrishnan VN. Expanding malware defense by securing software installations. Detection of intrusions and malware, and vulnerability assessment. Springer; 2008. pp. 164–185. Sun W, Sekar R, Poothia G, Karandikar T. Practical proactive integrity preservation: a basis for malware defense. In: IEEE symposium on security and privacy (S&P), pp. 248–262; 2008. http://doi.ieeecomputersociety.org/10.1109/SP.2008.35. Sze WK, Sekar R. A portable user-level approach for system-wide integrity protection. In: Proceedings of the 29th annual computer security applications conference (ACSAC ’13). ACM; 2013. pp. 219–228. Sze WK, Sekar R. Provenance-based integrity protection for windows. In: Proceedings of the 31st annual computer security applications conference. ACM; 2015. pp. 211–220. Tong H, Aditya Prakash B, Tsourakakis C, Eliassi-Rad T, Faloutsos C, Chau DH. On the vulnerability of large graphs. In: 2010 IEEE 10th international conference on data mining (ICDM). IEEE; 2010. pp. 1091–1096. Vijayakumar H, Schiffman J, Jaeger T. Integrity walls: finding attack surfaces from mandatory access control policies. In: 7th ACM symposium on information, computer, and communications security (ASIACCS); 2012. VirusTotal. Free Online Virus, Malware and URL Scanner. 2016. Available at: http://www.virustotal.com/. [Accessed 20 June 2016].
VXHeaven. Virus Collection. 2010. Available at: http://vx .netlux.org/. [Accessed 13 October 2010]. Wikipedia. Importance. 2015. Available at: https://en.wikipedia .org/wiki/importance. [Accessed 29 December 2015]. Wüchner T, Ochoa M, Pretschner A. Malware detection with quantitative data flow graphs. In: Proceedings of the 9th ACM SIGSAC symposium on Information, computer and communications security. ACM; 2014. Wüchner T, Pretschner A, Ochoa M. DAVAST: data-centric system level activity visualization. In: Proceedings of the eleventh workshop on visualization for cyber security. New York, NY, USA: ACM; 2014. pp. 25–32. ISBN 978-1-4503-2826-5. Wüchner T, Ochoa M, Pretschner A. Robust and effective malware detection through quantitative data flow graph metrics. Detection of intrusions and malware, and vulnerability assessment, Lecture notes in computer science, vol. 9148. 2015. pp. 98–118. ISBN 978-3-319-20549-6. Xuan C, Copeland J, Beyah R. Shepherding loadable kernel modules through on-demand emulation. In: International conference on detection of intrusions and malware, and vulnerability assessment. Springer; 2009. pp. 48–67. Yan K-K, Fang G, Bhardwaj N, Alexander RP, Gerstein M. Comparing genomes to computer operating systems in terms of the topology and evolution of their regulatory control networks. Proceedings of the National Academy of Sciences, 2010. Weixuan Mao received his B.S. degree in computer science and engineering and Ph.D. degree in control science and engineering from Xi’an Jiaotong University, Xi’an, China, in 2009 and 2017, respectively. His research interests include intrusion/malware detection, data driven security analysis. Zhongmin Cai received his B.S. degree in automatic control and Ph.D. degree in systems engineering from Xi’an Jiaotong University, Xi’an, China, in 1998 and 2004, respectively. He is currently a Professor at the School of Electronic and Information Engineering, Xi’an Jiaotong University. His research interests include Internet security, HCI behavior analysis, and machine learning. Don Towsley received his B.A. degree in physics and Ph.D. in computer science from the University of Texas at Austin, Austin, TX, USA, in 1971 and 1975, respectively. He is currently a Distinguished Professor at the College of Information and Computer Sciences, University of Massachusetts, Amherst. He served as Editorin-Chief of the IEEE/ACM TRANSACTIONS ON NETWORKING, and has previously served on numerous Editorial Boards. He has received the 2007 ACM SIGMETRICS Achievement Award, the 2008 ACM SIGCOMM Achievement Award, and numerous best conference/ workshop paper awards. His research interests include networks and performance evaluation. Qian Feng received her B.S. and M.S. degrees from Xi’an Jiaotong University, Xi’an, China, in 2008 and 2011, respectively. She is currently pursuing her Ph.D. degree with Syracuse University. Her research interests include malware detection and analysis, memory forensic analysis. Xiaohong Guan received his B.S. and M.S. degrees in automatic control from Tsinghua University, Beijing, China, and Ph.D. degree in electrical engineering from the University of Connecticut, Storrs, CT, USA, in 1982, 1985, and 1993, respectively. Since 1995, he has been with the Systems Engineering Institute, Xi’an Jiaotong University, and was a Cheung Kong Professor of Systems Engineering in 1999 and the Dean of the School of Electronic and Information Engineering in 2008. His research interests include allocation and scheduling of complex networked resources, network security, and sensor networks.