computers & security 55 (2015) 142–158
Available online at www.sciencedirect.com
ScienceDirect j o u r n a l h o m e p a g e : w w w. e l s e v i e r. c o m / l o c a t e / c o s e
On the ground truth problem of malicious DNS traffic analysis Matija Stevanovic a,*, Jens Myrup Pedersen a, Alessandro D’Alconzo b, Stefan Ruehrup b, Andreas Berger b a b
Department of Electronic Systems, Aalborg University, Aalborg, Denmark Forschungszentrum Telekommunikation Wien (FTW), Vienna, Austria
A R T I C L E
I N F O
A B S T R A C T
Article history:
DNS is often abused by Internet criminals in order to provide flexible and resilient hosting
Received 27 March 2015
of malicious content and reliable communication within their network architecture. The ma-
Received in revised form 5 July 2015
jority of detection methods targeting malicious DNS traffic are data-driven, most commonly
Accepted 7 September 2015
having machine learning algorithms at their core. These methods require accurate ground
Available online 15 September 2015
truth of both malicious and benign DNS traffic for model training as well as for the performance evaluation. This paper elaborates on the problem of obtaining such a ground truth
Keywords:
and evaluates practices employed by contemporary detection methods. Building upon the
DNS
evaluation results, we propose a novel semi-manual labeling practice targeting agile DNS
Traffic analysis
mappings, i.e. DNS queries that are used to reach a potentially malicious server character-
Ground truth
ized by fast changing domain names or/and IP addresses. The proposed approach is developed
Data labeling
with the purpose of obtaining ground truth by incorporating the operator’s insight in effi-
Blacklists
cient and effective manner. We evaluate the proposed approach on a case study based on
Whitelists
DNS traffic from an ISP network by comparing it with the popular labeling practices that rely on domain name and IP blacklists and whitelisting of popular domains. The evaluation indicates challenges and limitations of relying on existing labeling practices and shows a clear advantage of using the proposed approach in discovering a more complete set of potentially malicious domains and IP addresses. Furthermore, the novel approach attains time-efficient labeling with limited operator’s involvement, thus is promising in view of the adoption in operational ISP networks. © 2015 Elsevier Ltd. All rights reserved.
1.
Introduction
The Domain Name System (DNS) is a core component of the Internet that provides flexible decoupling of a service’s domain name and the hosting IP addresses (Mockapetris, 1987). However, in addition to the crucial role in functioning of benign Internetbased services, DNS is often abused by cyber criminals. Those
criminals rely on DNS to provide flexible and resilient communication between compromised machines of end-users and malicious infrastructure. For instance, DNS is commonly used by malware for discovering C&C (Command and Control) infrastructure, while spammers rely on DNS to redirect endusers to exploits or scam/phishing web pages. DNS traffic abused for illegal and malicious purposes by cyber criminals is commonly referred to as “malicious” DNS traffic.
* Corresponding author. Tel.: +45 71242666. E-mail addresses:
[email protected] (M. Stevanovic),
[email protected] (J.M. Pedersen),
[email protected] (A. D’Alconzo),
[email protected] (S. Ruehrup),
[email protected] (A. Berger). http://dx.doi.org/10.1016/j.cose.2015.09.004 0167-4048/© 2015 Elsevier Ltd. All rights reserved.
computers & security 55 (2015) 142–158
The detection of malicious DNS traffic has received increasing attention from the research community over the last decade (Silva et al., 2013). As a result, many detection approaches have been proposed targeting different characteristics of DNS traffic, analyzing DNS traffic at different monitoring points in the network and relying on several analysis techniques (Antonakakis et al., 2010, 2011; Berger and Gansterer, 2013; Bilge et al., 2014; Choi and Lee, 2012; Lee and Lee, 2014; Ma et al., 2009; Perdisci et al., 2009, 2012; Ramachandran et al., 2006; Villamarín-Salomón and Brustoloni, 2008). The majority of detection approaches are data-driven and rely on detection algorithms that are dependent on the quality of DNS traffic datasets used for their development, optimization and performance evaluation (Aviv and Haeberlen, 2011; Sommer and Paxson, 2010). Most commonly, machine learning algorithms (MLAs) are at the core of these methods (Antonakakis et al., 2010, 2011; Bilge et al., 2014; Choi and Lee, 2012; Ma et al., 2009; Perdisci et al., 2009, 2012). MLAs are typically trained using data sets in which the (true positive) elements to be detected are known. This is difficult to achieve in large and changing data sets such as Internet traffic. Therefore, the key challenge is obtaining accurate and reliable “ground truth” on malicious and benign DNS traffic. In fact this is crucial for evaluating performance of detection methods, whereas for supervised learning approaches it has a crucial role in the model training. In this paper we evaluate the practices of obtaining the ground truth used by contemporary DNS-based detection methods. In particular, we discuss the popular practice of relying on third-party information i.e., domain name/IP blacklists and whitelisting of popular domains (Levine, 2010). As these practices define how to label DNS traffic, we refer to them as labeling techniques in the following. We discuss challenges and pitfalls of existing labeling practices that in our opinion are often not adequately addressed by network security practitioners. Building on top of the findings of several authors that have scrutinized the use of domain (Kührer et al., 2014; Sheng et al., 2009; Sinha et al., 2008) and IP (Dietrich and Rossow, 2009) blacklists, we acknowledge the importance of domain/IP blacklists for the process of traffic labeling but we also stress that the process of obtaining the ground truth should not exclusively rely on these blacklists as they often have unknown origins, are formed using various input information and consequently have different scope. Furthermore, we present a novel semi-manual labeling approach developed with the goal of providing reliable ground truth of “agile” DNS by incorporating operator’s insight in efficient manner. The agile DNS is a general term covering different dynamic hosting strategies in which domain names and IP addresses associated with a particular service are changing over time (Antonakakis et al., 2010; Berger and Gansterer, 2013), such as IP-flux (Holz et al., 2008) and Domain-flux (Yadav et al., 2010). We focus on agile DNS as many modern detection approaches (Antonakakis et al., 2010, 2011; Bilge et al., 2014; Perdisci et al., 2012) target agile DNS traffic and as it is widely abused by cyber criminals in order to avoid existing detection methods and take down techniques, thus providing resilience of malicious services and communication. In order to evaluate the novel labeling approach we perform a case study using DNS traffic traces from a regional ISP. We evaluate the proposed labeling approach against conventional labeling techniques based on domain/IP blacklists
143
and whitelisting of popular domains. Results point out some of the limitations and pitfalls of conventional labeling approaches as well as capabilities of the proposed approach to discover a comprehensive set of malicious domain-to-IPs mappings in efficient and timely manner. The rest of the paper is organized as follows. Section 2 provides background on DNS and existing DNS-based detection approaches. Section 3 reviews labeling techniques employed by the state-of-the-art methods for identifying malicious DNS traffic. Section 4 introduces a novel semi-manual labeling approach. Section 5 evaluates the performance of the proposed labeling approach on a case study of DNS traffic from a regional ISP network. Section 6 discusses evaluation results outlining possibilities for future work. Finally, Section 7 concludes the paper.
2.
Background
DNS provides the decoupling of service names and their corresponding IP address, such that for a given domain name it can resolve a corresponding IP address. To resolve a domain, a host typically needs to consult a local recursive DNS server (RDNS). A recursive server iteratively discovers which Authoritative Name Server (ANS) is responsible for each zone. This results in an iterative querying process that yields the mapping between the requested domain name and its current IP addresses. In the following we refer to domain names as Fully Qualified Domain Names (FQDNs). Furthermore, we refer to the n-th level of an FQDN as n-LD, e.g., the 1-LD (top-level domain, TLD) of www.example.org is org, the 2-LD is example, and the 3-LD is www.
2.1.
Misuse of DNS
Internet criminals abuse DNS in order to provide flexible and resilient communication within their communication architecture. DNS provides flexibility and resilience of communication by providing the availability of malicious services even if it has been moved to another hosting address. Furthermore, DNS traffic is present in all networks and it is not usually filtered or blocked by firewalls, thus providing stealthy and undisturbed communication. Miscreants can use common static hosting strategies often deployed by small and medium enterprises or more dynamic networking strategies similar to the Content Distribution Networks (CDNs). In order to achieve high availability and resilience against countermeasures, cyber criminals adopt dynamic DNS-based networking strategies that are characterized by highly dynamic FQDNs-to-IPs mappings often referred to as agile DNS (Antonakakis et al., 2010; Berger and Gansterer, 2013).The most well known agile DNS strategies are IP-flux (Holz et al., 2008) and Domain-flux (Yadav et al., 2010). IP-flux or Fast-flux refers to the constant changing of IP address information related to a particular FQDN (Holz et al., 2008). This technique is abused to change IP address information associated with a FQDN by linking multiple IP addresses with a specific FQDN and rapidly changing the linked addresses. Fast-flux provides reliable communications due to high number of IPs associated to certain FQDN. Domain-flux is
144
computers & security 55 (2015) 142–158
effectively the inverse of IP-flux and refers to the constant changing and allocation of multiple FQDNs to one or multiple IP addresses. Domain Generation Algorithm (DGA) (Yadav et al., 2010) is one of the most prominent domain-flux techniques, that creates a dynamic list of multiple FQDNs, which are then polled by the malware agent as it tries to locate the C&C infrastructure. Since the domain names are dynamically generated in large volume and typically have a short life, they are able to avoid FQDN-based blacklisting. IP-flux and Domain-flux techniques are widely used to dynamically host malicious sites or provide reliable communication within C&C infrastructure.
2.2.
is the ability to accurately, reliably and efficiently label DNS traffic as being malicious or not.
3.
Labeling practices
Contemporary DNS-based detection approaches consider a number of options for obtaining the ground truth on malicious DNS traffic, where the majority of solutions relies on automated labeling using commercial FQDN/IP blacklists and whitelisting of popular domains. This section gives an overview of the contemporary labeling approaches and their characteristics.
Detection of malicious DNS traffic 3.1.
During the last decade a number of detection approaches that target malicious DNS have been developed (Antonakakis et al., 2010, 2011; Berger and Gansterer, 2013; Bilge et al., 2014; Choi and Lee, 2012; Lee and Lee, 2014; Ma et al., 2009; Perdisci et al., 2009, 2012; Ramachandran et al., 2006; Villamarín-Salomón and Brustoloni, 2008). DNS-based detection approaches are popular for several reasons. First, spammers rely on DNS for redirecting the user to scam or phishing web pages. Second, the vast majority of malware rely on DNS in some phase of their lifecycle. Third, cyber criminals usually employ agile DNS techniques that produce specific DNS traffic patterns which can be used for their detection. Fourth, DNS traffic only represents a small portion of total traffic so it is suitable for online processing in wide-area networks and higher network tiers, providing comprehensive picture on miscreant’s communication infrastructure. Majority of the detection approaches target agile DNS, most commonly IP-flux and Domain-flux techniques. By targeting agile DNS detection approaches cover a wide range of malicious services that rely on agility of DNS mappings for assuring resilience. Based on the vantage point for capturing DNS traffic the approaches can be divided into those that look at DNS traffic between hosts and RDNS servers (Antonakakis et al., 2010; Berger and Gansterer, 2013; Bilge et al., 2014; Choi and Lee, 2012; Lee and Lee, 2014; Perdisci et al., 2009; Ramachandran et al., 2006; Villamarín-Salomón and Brustoloni, 2008) and the ones that analyze DNS traffic at upper DNS hierarchy (Antonakakis et al., 2011; Ma et al., 2009; Perdisci et al., 2012) i.e., above RDNS. Detection approaches rely on different perspectives of traffic analysis where some identify malicious domain names (Antonakakis et al., 2011; Bilge et al., 2014), others identify clusters of suspicious domains (Antonakakis et al., 2010; Choi and Lee, 2012; Perdisci et al., 2012), while some use graph theory to analyze DNS mappings (Berger and Gansterer, 2013; Lee and Lee, 2014). Furthermore, the contemporary DNS-based detection approaches rely on different detection algorithms usually based on supervised/unsupervised machine learning techniques. As machine learning techniques are data-driven, they depend on the “quality” of the data sets used for their development, optimization and evaluation. By quality we refer to the existence of substantial amount of data that successfully capture both malicious and benign DNS traffic characteristics for which the ground truth is known. Therefore, one of the main prerequisites of the effective use of DNS-based detection approaches
Labeling in the existing work
In the following we analyze labeling techniques used by some of the most prominent DNS-based detection approaches (Antonakakis et al., 2010, 2011; Bilge et al., 2014; Choi and Lee, 2012; Perdisci et al., 2012). Table 1 shows a summary of labeling techniques used by these approaches. Antonakakis et al. have introduced NOTOS (Antonakakis et al., 2010), a dynamic reputation system for DNS that can indicate malicious use of agile DNS. The approach analyzes DNS traffic at RDNS using both supervised and unsupervised machine learning in order to cluster FQDN names in a number of predefined classes of DNS traffic. The authors rely on blacklists and whitelists both for obtaining the ground truth that is used for training the system, as well as for the evaluation of the system results. The approach relies on four different blacklists covering different malicious DNS traffics, while relying on whitelisting 500 top domains from alexa.com, as well as a number of common CDNs and other benign dynamic domains (2-LD + TLD). For the evaluation of the system the authors experiment with using extensive whitelists with over 10,000 domain names. In addition to NOTOS, Antonakakis et al. proposed KOPIS (Antonakakis et al., 2011) an approach for identifying malwarerelated domains by analyzing DNS traffic in upper DNS hierarchy. The approach classifies malicious domain names based on the set of domain-related features using supervised machine learning. The approach is relying on blacklists and whitelists for labeling traffic that will be used for both training as well as evaluation of the approach. The approach is using two publicly available FQDN blacklists and two blacklists based on malware feeds. The method whitelists a relatively modest number of top alexa.com domains and from dnswl.org. Perdisci et al. proposed FluxBuster (Perdisci et al., 2012), a passive DNS traffic analysis system that targets fast-flux DNS traffic by monitoring traffic above RDNS. The approach classifies the clusters of malicious domains using supervised machine learning. The ground truth used for training supervised MLA is obtained by a semi-manual labeling approach, while the method relies on blacklists and whitelists for evaluating results of classification. The approach relies on more than 12 publicly available FQDN blacklists, and 3 different whitelists where some of them imply extensive whitelisting. Choi and Lee proposed BotGad (Choi and Lee, 2012), a botnet detection approach that targets malware related DNS traffic
145
computers & security 55 (2015) 142–158
Table 1 – Overview of labeling practices used by some of the most well regarded contemporary DNS-based detection methods. Method
Training/Evaluation
Blacklists
Antonakakis et al. (2010) NOTOS (2010)
Training and Evaluation
malwaredomains.com (FQDN) malwaredomainlist.com (FQDN and IP) spamhaus.org (IP) zeustracker.abuse.ch (FQDN and IP)
Antonakakis et al. (2011) KOPIS (2011)
Training and evaluation
Several public blacklists (only two stated): malwaredomains.com (FQDN) zeustracker.abuse.ch (FQDN)
Perdisci et al. (2012) FluxBuster (2012)
Evaluation
Choi and Lee (2012) BotGAD (2012)
Evaluation
Bilge et al. (2014) EXPOSURE (2014)
Training and evaluation
FQDNs from two malware feeds. abuse.ch (FQDN) – 75 flux 2LDs 12 public blacklists (two of them stated): malwaredomains.com (FQDN) malwarepatrol.com (FQDN) kisarbl.or.kr (FQDN) malwaredomains.com (FQDN) cyber-ta.org (FQDN) siteadvisor.com (FQDN) mywot.com (FQDN) domaincrawler.com (FQDN) spamhaus.org (FQDN and IP) domains.com (FQDN) zeustracker.abuse.ch (FQDN and IP) malwaredomainlist.com (FQDN) wepawet.cs.ucsb.edu (FQDN) A set of Anubius reports (FQDN) phishtank.com (FQDN) siteadvisor.com (FQDN) safeweb.norton.com (FQDN) FQDNs generated by Torpig and Conficker botnets
by monitoring traffic at RDNS. The method employs unsupervised machine learning in order to cluster FQDNs as malicious or non-malicious. The method relies on 7 FQDN blacklists and domain reputation services for evaluating the results of clustering. Bilge et al. have proposed EXPOSURE (Bilge et al., 2014), a large-scale system that analyzes DNS traffic at RDNS in order to detect domain names involved in malicious activity. The detection system uses supervised machine learning to classify domain names as malicious based on a set of statistical features. The system relies on FQDN/IP blacklists and whitelists of popular domains in order to establish the ground truth for traffic used both for training and evaluation of the used classifier. The approach relies on 7 different blacklists and domain reputation tools that cover different malicious DNS traffics. Furthermore, the approach uses blacklists based on domains generated by Torpig and Conficker botnets. Whitelisting is done considering the 1000 most popular domains from alexa .com.
3.2.
Use of blacklists and whitelists
As illustrated in the previous subsection blacklists are widely used by the contemporary detection approaches for data
Whitelists alexa.com – top 500 domains 18 common 2-LDs for CDNs. 464 dynamic 2-LDs. Evaluating with 10,000 and 100,000 top alexa.com domains. dnswl.org alexa.com – top 30 domains Network range of whitelisted FQDNs checked by: ipindex.homelinux.net alexa.com – top 100,000 domains. dmoz.org – over 300,000 2-LDs List of 31 non-malicious 2-LDs (manually compiled). None
alexa.com – top 1000 domains FQDN older than year Whitelisted FQDNs checked with: safeweb.norton.com (FQDN) Google Safe Browsing (FQDN) siteadvisor.com (FQDN) dmoz.org (FQDN)
labeling. Blacklists are generally defined by security professionals based on observation regarding malicious use of DNS observed by deployed Honeypots, SPAM traps or by performing malware testing. As such blacklists represent an invaluable indicator of FQDNs and IPs nature. However, blacklists have a number of limitations that need to be understood in order for them to be successfully used for labeling. Blacklist can be defined based on diverse input information and thus have often non-overlapping scope. In practice, this means that different blacklists will list FQDNs/IPs based on different criteria. Some of them list all SPAM related domains (The Spamhaus Project Ltd, 2014), others track domains related to the specific botnet (Abuse.ch, 2013; Abuse.ch, 2014), whereas some list domains queried by the malware under testing (Malware domain list, 2014). Therefore, different blacklists do not necessarily agree on the nature of certain FQDN/IP. Additionally, the fact that a FQDN/IP is listed by some blacklists does not necessarily mean that it is significant for the task in question. Some blacklists are based on reputation systems that rely on users feedback regarding the maliciousness of certain FQDN/ IP. This means that such reputation scores depend on judgment of people with different technical backgrounds and perspectives on what should be considered malicious. Consequently, many benign domains are deemed malicious possibly leading to
146
computers & security 55 (2015) 142–158
wrong conclusions. For instance, on some reputation systems (Wot Services Ltd, 2014), web pages are rated regarding unethical content and incorrect product claims, which might be relevant in some context, but not from the network security perspective. In addition, it should be noted that FQDN and IP blacklists are characterized by different validities of labels. In case of FQDN, once a label is formed it typically would not change over time. On the other hand IP labels can change over time, due to the fact that IPv4 address space is limited and that some of it has been resold over the years. Therefore, it can happen that the blacklisted addresses after some time are cleaned and used by different organizations for legitimate purposes. This implies that IP labeling should not consider blacklists that are older than a couple of months. Several authors have analyzed the characteristics of FQDN/ IP blacklists in order to shed light on the effectiveness of their use (Dietrich and Rossow, 2009; Kührer et al., 2014; Sheng et al., 2009; Sinha et al., 2008). The authors not only acknowledge the importance of blacklists for identifying the misuse of DNS services but also point out different limitations. Sinha et al. (2008) found that blacklists are characterized by significant false negative and false positive rates. Sheng et al. (2009) indicate that different blacklists vary in coverage and take a significant time to be updated. Kührer et al. (2014) found that the union of all 15 public blacklists includes less than 20% of the malicious domains for a majority of prevalent malware families and most Anti Virus vendor blacklists fail to protect against malware that utilizes Domain Generation Algorithms. Finally, Dietrich and Rossow (2009) indicate that IP blacklists vary in size and coverage, and that due to the short activity period of IP addresses there is little sense in keeping an IP on the blacklist forever. In addition to evaluating the characteristics of blacklists some authors such as Kheir et al. (2014) have proposed methods for pruning the blacklists of benign domains, and thus eliminating the falsely blacklisted domains. The presented approach presents a valuable tool for cleaning up the blacklist but as it is relying on supervised machine learning it also needs to be trained thus suffering from the ground truth problem. Whitelisting of popular domains is a widely used practice for filtering benign DNS traffic. The main idea behind this approach is to build a list of benign domains and use it to filter out benign traffic. However, as there are over 271 million registered domains in 2014 (Verisign, 2014) defining the list of benign ones is not a trivial task. In order to circumvent this
Automated analysis
Filtering graph components
DNSMap
DNS dataset
many authors (Antonakakis et al., 2010, 2011; Bilge et al., 2014; Perdisci et al., 2012) generate whitelists by relying on lists of the most popular domains, assuming that popular domains are benign. One of the most used references is alexa.com list of the most popular domains (Amazon Inc, 2014). Furthermore, some authors use extensive white lists that cover up to 10,000 of the most popular sites (Antonakakis et al., 2010). The use of whitelists provides filtering of domains thus reducing the total amount of traffic needed to be processed. However, as many popular domains are often related to the distribution of malware, extensive whitelisting can lead to filtering information about suspicious domains, and consequently to the sub-optimal labeling.
4.
The semi-manual labeling approach
In this section we present a novel semi-manual labeling approach that is tailored to capture agile DNS traffic. The purpose of the proposed approach is to provide efficient, reliable and accurate DNS labeling while minimizing human involvement. The approach consists of several phases as illustrated in Fig. 1. The first phase relies on DNSMap (Berger and Gansterer, 2013) to isolate agile FQDNs-to-IPs mappings that are further analyzed by the system. Extracted mappings are represented as graph components, where FQDNs and IPs are nodes, while edges indicate the existence of a mapping in the DNS between them. The second phase is a filtering phase where extracted graph components are filtered so that only the most interesting ones, from the perspective of network security, are kept for further analysis. The third phase is an automated analysis phase that implements the extraction of a number of features designed for capturing the malicious nature of graph components. The fourth phase uses the extracted features to cluster agile graph components into malicious and nonmalicious components. The final phase of the proposed labeling approach incorporates operator expert knowledge in the labeling procedure. The operator analyzes provisional labels and corrects possible errors introduced by the automated part of the system. In support of making a qualified judgment, the operator has all features extracted for the agile graph components at his disposal. The operator labels agile FQDNs-to-IPs mappings as malicious if there is sufficient evidence that they are related to malicious activities such as: SPAM distribution,
Parameterization
FQDN
Malicious
Cluster analysis
IP
Agile components
Validation of clustering results / Assigning labels
Grouping graph components to two clusters
Selected graph components/ mappings
Set of features for each component
Non-malicious
Clustered graph components
Con fi guration of thesholds
Validation
Operator
Fig. 1 – A novel semi-manual labeling approach.
Labeled graph components
Labeled FQDN/IP dataset
computers & security 55 (2015) 142–158
hosting of malware, phishing, botnet communication etc. The presented labeling procedure assigns a label to each of the extracted graph components. The proposed approach assumes that all FQDNs and IPs within an agile graph component have the same nature. Therefore, a graph component and all FQDNs and IPs belonging to it are considered malicious if there is a number of FQDNs or/ and IPs within that graph that are proven to be associated with some of the malicious activities. This principle of functioning can be referred to as labeling by association.
4.1.
DNSMap
The level of DNS mapping agility of Internet services varies widely. DNSMap’s main objective consists of providing an adaptive characterization of this agility. The approach tracks FQDNs and corresponding IPs, within an initial learning period, in order to get an understanding of typical DNS activity for observed FQDNs and IPs. This way a profile of normal DNS traffic is learned and any DNS mapping which involves an FQDN/IP that does not fit to the profile is considered suspicious. All suspicious mappings in a certain analysis period are analyzed as bipartite graphs, where FQDNs and IPs are nodes, and edges indicate the existence of a suspicious mapping between them. The authors argue that the structural properties of these graphs relate to the maliciousness of the underlying DNS activity, that can be identified by applying standard graph analysis techniques. Examples of the extracted graph components are illustrated in Fig. 2. DNSMap is able to target specific agile DNS traffic by tuning the analysis through the process of parametrization. It should be noted that we perform parametrization in line with the best practice described by Berger and Gansterer (2013). First, DNSMap
147
uses the first two days of each traffic trace as initial learning periods. The length of the learning period is chosen based on the previous experiences in using DNSMap (Berger and Gansterer, 2013). Second, in order to capture weekly dynamics of network traffic the traffic is then analyzed on weekly basis i.e., for each consecutive week a number of agile graph components is extracted.
4.2.
Filtering graph components
In the filtering phase, we select graph components that are most interesting for our analysis. The main goal of this phase is to prune the graph components isolated by the DNSMap in order to filter out less agile graph components and select the ones that are more likely to facilitate malicious services. The challenge is to define the filtering so it would result in a number of graphs that could be manually validated in the last phase of the approach while successfully capturing malicious DNS activity. Malicious traffic has to move across several FQDNs and IPs in the attempt of avoiding to be blocked, therefore exhibiting the agile behavior. The agility of DNS traffic can be broadly quantified with the number of FQDNs and IPs that were used within the period of observation. In order to avoid being blacklisted FQDNs need to be changed often such as in the case of DGA where usually a couple of hundred domains are generated per day for a specific malicious service (Antonakakis et al., 2012; Yadav et al., 2010). In case of IP-flux miscreants use a different IP address every couple of minutes often having a hundreds of IPs distributed over a number different countries and ASs (Holz et al., 2008). For the presented semi-manual labeling approach we define “interesting” graph components as ones with at least 40 FQDNs
149.93.111.234 149.93.66.241 149.93.239.175 149.93.224.152 38.229.142.125 38.229.183.3 149.93.217.100 ouyukslxmkd.biz eigvnnph.biz ovgvxvtuu.biz 38.229.172.47 risndcvsq.ws 149.93.14.120 yhcvytbd.biz bxqmoetfng.biz djzrzocnq.biz wuxpcuho.ws 149.93.225.71 swreltvis.biz mvxpqdhf.ws 149.93.94.98 mnhrgg.biz 38.102.150.27 216.66.15.109 srmchnz.biz qmmvdzlea.ws 149.93.204.238 149.93.60.153 nxtfqkhsdq.ws ldueuhvq.ws iddmveltonl.biz 149.93.105.74 dyjoyjdmq.biz ldxgfklxkvv.biz dromjb.biz 38.229.148.184 lvzhwlm.ws 149.93.125.95 zuckcmmvpsm.biz 149.93.221.248 149.93.84.86 149.93.92.11 149.93.25.205 149.93.66.152
(a) A smaller, less agile, graph component.
(b) A bigger, more agile, graph component.
Fig. 2 – Examples of DNSMap output i.e., agile graph components. Nodes within graph components represent FQDNs (red) and IPs (blue) while edges represent existence of mappings between them. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
148
computers & security 55 (2015) 142–158
or at least 20 IPs while having their IPs belonging to at least 2 Autonomous Systems (ASs), within a week long analysis period. In the light of the domain-flux (Antonakakis et al., 2012; Yadav et al., 2010) and IP-flux (Holz et al., 2008) studies the filtering thresholds can be considered conservative thus capturing even a moderately agile DNS. Furthermore, these criteria result in a manageable number of graph components that need to be manually validated. This has been confirmed by the experiments with real data where the adopted thresholds resulted in about a hundred graph components that need to be manually validated weekly, as shown by Table 4. We are aware that some of the filtered agile components can still be malicious such as in the case of botnet whose relaying proxies belong to the same AS or similar. However, such agile DNS would not provide a resilient and reliable malicious hosting or relaying, and they would be better countered by conventional countermeasures. This is supported by findings of Knysz et al. (2011) that have analyzed fast-flux evasion strategies and that have derived models describing the relation between the number of online malware IP addresses and the availability of the corresponding malware sites. They consider a minimum number of 100 unique IP addresses per week and FQDN, which according to their model results in a connection loss probability of 71.1%. Although this would already result in poor malware connectivity, this kind of activity is still far beyond the sensitivity of the proposed filtering. Therefore we believe that the minimal number of FQDN and IP nodes defined by the adopted thresholds are quite low and that the filtering would not cause significant loss in the information on malicious agile DNS traffic.
4.3.
Automated analysis
Automated analysis is used for characterizing the graph components returned by the filtering phase. This phase extracts a number of features for each of the selected graph components. The features were designed leveraging both theoretical knowledge and empirical evidence on the actual behavior of malicious traffic. The features can be can be grouped in 6 categories corresponding to different types of analysis performed on the graph components: 1. Graph analysis examines connections between edges in graphs to recognize connectivity pattern that characterize common hosting/fast-flux/domain-flux. The main assumption is that the malicious FQDNs-to-IPs mappings are characterized by a specific graph structure. The structural properties of graphs are described by the following features: • Number of FQDN/IP nodes. Bigger graph components are more likely linked to malicious behavior, as they represent more agile mappings within the analysis period. • Maximum degree of FQDN/IP nodes. IP-flux and domainflux are characterized by a high maximal value of the degree of FQDN and IP nodes, respectively. The degree of a node of a graph is the number of edges incident to the node, with loops counted twice. • Maximum FQDN/IP node betweenness. Node betweenness is a global centrality measure that quantifies importance of a particular vertex in a graph. Node betweenness is equal to the number of shortest paths from
all vertices to all others that pass through that node. IPflux is characterized by a high maximal value of IP node betweenness for a specific FQDN node, while domainflux is characterized by a high maximal value of FQDN node betweenness for a specific IP node. 2. FQDNs analysis examines domain names within the graph component in order to check if there are some suspiciously looking pseudo-random domains that are often associated with malicious services. Properties of domain names are described using the following set of features: • Number of tokens (n-LDs). Many legitimate services rely on FQDNs with more than 3 levels. • Suspicious TLDs. Indication if some of the 50 most famous TLDs associated with malicious activity (Damballa Inc, 2009; Kay and Greve, 2011; Yan, 2013) are present in the graph component. • 2-LDs/3-LDs features: – Number of English words in 2-LDs/3-LDs. Pseudo-random domains are characterized by smaller number of words within them. – Number of numerical characters in 2-LDs/3-LDs. Pseudorandom domains are characterized by higher number of numerical characters. – Length of 2-LDs/3-LDs. Malicious pseudo-random domains often have long pseudo-random 2-LDs or 3-LDs. • Share of the most frequent 2-LDs/3-LDs within the graph component. Situation in which a specific 2-LD or 3-LD dominate within the graph component i.e., when the majority of domains share the same 2-LD or 3-LD, is usually associated with legitimate services. 3. IPs analysis examines IPs within the graph component. The main assumption is that IPs are more diverse within the malicious graphs. Properties of IPs are described using the following set of features: • Number of ASs. High number of ASs to which IP addresses within the graph component belong to is often associated with malicious DNS strategies. • Trustworthy ASs. Trustworthy ASs are those ones belonging to some well-known service providers. Benign graph components usually have IPs hosted by trustworthy ASs. • Number of countries. Malicious services are usually hosted over a high number of countries. • Geographical distance between hosting countries. Malicious graph components are often characterized by greater distances between countries in which their IPs are hosted. • Suspicious countries. Indication if IPs are hosted in some of the 25 well-known countries for hosting malicious domains (Yan, 2013). • Scatterness of IPs. Malicious hosting often uses IP addresses that are scattered over multiple networks. In this paper we rely on scatterness measure proposed by Berger and Gansterer (2013). For determining the number of hosting ASs and countries we used MaxMind’s freely available databases of AS numbers (Maxmind Inc, 2014a) and countries (Maxmind Inc, 2014b). Geographical locations of countries were determined using OpenStreetMap geocoder service (Openstreetmap, 2014). 4. FQDN whitelist analysis examines if popular domains are present in the analyzed graph component. The main assumption is that the presence of popular domains can
149
computers & security 55 (2015) 142–158
indicate that the graph component is benign. Properties of whitelisting analysis are described using the following set of features: • Number of whitelisted domains. Graph component contains a substantial number of popular domains that are often non-malicious. • Top 2-LD whitelisted. If a specific 2-LD dominates within the graph component and if it is within the most popular domains, then the component is most likely non-malicious. Whitelisting of popular domains was implemented using the alexa.com list of the most popular domains (Amazon Inc, 2014). 5. FQDN blacklist analysis checks if FQDNs within the graph component are blacklisted. The main assumption is that malicious components are characterized by a number of blacklisted domains. Properties of FQDN blacklist analysis are described using the following set of features: • Number of blacklisted FQDNs. A graph component could be considered malicious if a certain number of FQDNs within it are blacklisted. • Number of blacklisted 2-LD + TLD. A graph component could be considered malicious if a certain number of 2-LDs +TLDs within it are blacklisted. • Blacklist status. A graph component could be considered malicious if FQDNs within it are blacklisted by a substantial number of blacklists. We define Blacklist Status as a ratio of the number of blacklists that flagged certain FQDN and the total number of checked blacklists. Domains were checked against 29 domain blacklists available through urlvoid.com. 6. IP blacklist analysis checks if IPs within the graph component are blacklisted. The main assumption is that malicious components are characterized by a number of blacklisted IPs. Properties of IP blacklist analysis are described by the following features: • Number of blacklisted IPs. A graph component could be considered malicious if a certain number of IPs within it are blacklisted. • Blacklist status. A graph component could be considered malicious if IPs within it are blacklisted by a substantial number of blacklists. We define Blacklist Status as a ratio of the number of blacklists that flagged certain IP and the total number of checked blacklists. • “Active” IPs. A graph component is considered malicious if blacklisted IPs were observed in malicious context in the same period when the trace was recorded. IPs were checked against 36 IP blacklists available through ipvoid.com, while projecthonepot.org service was used to check if any malicious activity was observed for a certain IP address at a certain time. The presented features are implemented as a number of numerical and binary features. The full list is given in Table 2.
4.4.
Cluster analysis
Table 2 – Features extracted for each agile graph component. Type of analysis Graph analysis
FQDNs analysis
IPs analysis
FQDNs whitelist analysis
FQDNs blacklist analysis
IPs blacklist analysis
Number of IPs Number of FQDNs Max degree of IP nodes Max degree of FQDN nodes Max FQDN node betweenness for IP nodes Max IP node betweenness for FQDN nodes Max number of tokens (n-LD) Number of distinct suspicious TLDs Max length of 2-LD Max length of 3-LD Max number of numeric characters Max number of words (English) Share of the most frequent 2-LD Share of the most frequent 3-LD Number of ASs Number of trustworthy ASs Number of distinct hosting countries Average distance between hosting countries Number of suspicious countries IPDistScore – measure of IP scatteredness Number of whitelisted FQDNs in top 10000 alexa.com most popular domains The most frequent 2-LD in top 1000 alexa.com most popular domains Number of blacklisted FQDNs Average FQDNs blacklist status Max FQDNs blacklist status Number of blacklisted 2-LD + TLD Average 2-LD + TLD blacklist status Max 2-LD + TLD blacklist status Number of blacklisted IPs Average IPs blacklist status Max IPs blacklist status Number of “active” IPs
labels to each of the graph components, based on a well defined clustering algorithm and a set of features describing malicious agile traffic. This way the operator will have an initial assumption regarding the nature of the graph components which should save some time in comparison to defining the labels purely based on own expert knowledge. The cluster analysis is implemented using K-means clustering algorithm (Jain, 2010). Each graph component is an instance represented as a set of features extracted by the Automated Analysis phase. In total we have defined 32 features that will be considered for the task of clustering graph components, as illustrated by Table 2. The analysis of the features and their impact on clustering results is discussed in Section 5. The proposed clustering follows the principles of DNS traffic analysis proposed by Bilge et al. (2014) and Antonakakis et al. (2010). However, it should be noted that the proposed clustering algorithm is not developed with ambition of competing with existing DNS detection approaches. The used clustering have a goal of a coarse grouping of the graph components thus assisting the operator in the process of labeling traffic rather than replacing his insight by a fully-automatic system.
4.5. Cluster Analysis groups graph components into two clusters representing malicious and non-malicious components. The main motivation for using clustering is to assign provisional
Features
Assigning provisional labels
Cluster analysis groups graph components into two distinct clusters, but does not assign labels to them. In order to assign
150
computers & security 55 (2015) 142–158
provisional labels to the two clusters the approach evaluates the following assumption regarding characteristics of malicious and non-malicious clusters. We assume that a malicious cluster has a higher average number of blacklisted FQDNs per graph component than non-malicious cluster. Therefore, a cluster is marked as malicious if it fulfills the assumption. The rational behind the use of this assumption is the fact that blacklists represent important indicators of the maliciousness of FQDNs and IPs, and that it is reasonable to assume that graphs within the malicious cluster will be characterized with higher average number of blacklisted domains. It should be noted that we do not use the equivalent IP blacklists based assumption as IP blacklists are arguably less reliable due to the nature of the malicious use of IPs (Dietrich and Rossow, 2009) and consequently different validities of IP blacklist labels, as discussed in Section 3.2.
4.6.
Manual validation
The final phase of the proposed labeling approach is the manual validation. This phase is one of the key elements of the proposed approach as it incorporates human insight into the labeling process. Although we could use the automatically generated provisional labels for making the final decision, it is beneficial to involve a human operator. Here, the operator performs manual inspection of the provisional labels assigned by the clustering in order to eliminate any errors in labeling. The main task of the operator is to go though all graph components extracted by the system and check the provisional labels assigned in the previous step. In order for the operator to perform the validation in the most efficient way, the system provides values of all features extracted for the labeled graphs by the Automated Analysis phase. Furthermore, the operator has the result of the assumption check performed within the process of assigning provisional labels. For evaluating the provisional labels the operator can easily validate the graph component label by just reviewing over the values of the extracted features. However, in some cases there are no sufficient evidence or there are contradictory indications from different analysis steps, so the operator needs to perform further investigation. This might include different actions such as searching the web for security reports for particular FQDNs/ IPs and resolving a subset of FQDNs in a browser.
5.
Case study
In this section we evaluate the proposed labeling approach within a case study based on traffic from an ISP network by comparing it with conventional labeling approaches that rely on FQDN/IP blacklists and whitelisting of popular domains. The proposed approach is implemented in Python by extending existing DNSMap tools (Berger, 2014). The automated analysis phase of the proposed labeling approach relies on a number of external resources and on-line services. External resources used by the system are: MaxMind’s freely available databases of AS numbers (Maxmind Inc, 2014a) and countries (Maxmind Inc, 2014b) and alexa.com list of the most popular domains (Amazon Inc, 2014). Online services used by
the approach are: OpenStreenMap online geocoder service (Openstreetmap, 2014), urlvoid.com domain blacklist service (Urlvoid, 2014), ipvoid.com IP address blacklist service (IPvoid, 2014) and projecthoneypot.org IP data directory service (Unspam Technologies, Inc., 2014). The evaluation was done in September–November 2014 so the results presented in this section were formed based on the state of the external resources and on-line services taken at that time.
5.1.
Dataset
For the evaluation we use DNS traffic traces recorded at the network of a regional ISP from Denmark. Two traces recorded in September 2013 and June–July 2014 i.e., DS1 and DS2 were available for the evaluation. The traces contain all DNS traffic (UDP port 53) produced within the network during these periods of time. The data sets represent a substantial amount of DNS traffic recorded at different times, thus providing a comprehensive overview of characteristics of agile DNS traffic. The summary of the data sets is presented in Table 3. As already mentioned DNSMap is taking the first two days from each of the two traces for “learning” the usual DNS mappings and for each consecutive week of traffic traces extract a number of agile graphs. As a result we have 5 weeks of traffic on our disposal. The graph components extracted for each of the weeks are then labeled using the proposed approach. It should be noted that the performance of processing DNS traffic using DNSMap, during the evaluation were in line with those reported by Berger and Gansterer (2013), where DS1 trace was processed in 16 hours while DS2 was processed in around four days. Furthermore, the automated part of labeling is taking 2 and 12 hours for DS1 and DS2 data trace, respectively. Table 4 reports the results of pre-processing available data sets using DNSMap as well as the results of filtering the extracted graph components using settings defined in Section 4.2. The filtering phase further filters the graph components by a factor of 1000, resulting a few hundred graph components, suitable for manual analysis by the operator.
5.2.
Performance of cluster analysis
In order to achieve accurate clustering of components we analyze the performance of the clustering algorithm with respect to the feature set used to represent the components within the clustering algorithm. We analyze the use of the features presented in Table 2 where we analyze the use of all features as well as the use of features per a specific analysis type. In order to perform the evaluation clustering results are compared against graph components extracted by the proposed semi-manual labeling approach, where the operator manually evaluated all graph components. Finally, it should be noted that we use graph components from all five weeks
Table 3 – DNS data sets used for the evaluation. Trace DS1 DS2
Duration
Total mappings
Unique FQDNs
Unique IPs
7 days 28 days
251M 1.84B
3.04M 10.89M
742k 1.4M
151
computers & security 55 (2015) 142–158
Table 4 – The results of pre-processing available data sets using DNSMap and the results of filtering the extracted graph components. Trace
DS1 DS2
Duration
Week #1 Week #1 Week #2 Week #3 Week #4
DNSMap output Graph components
FQDNs
IPs
Graph components
FQDNs
IPs
95,595 133,259 129,871 137,779 103,615
345,593 530,755 510,507 558,221 381,685
139,720 199,469 194,806 207,472 156,814
90 163 176 168 112
9376 27,140 25,618 38,605 14,370
10,077 8212 15,553 20,350 10,695
5 days 7 days 7 days 7 days 5 days
worst performance were obtained when relying on features from IP analysis and FQDN whitelist analysis having an accuracy close to random choice. Furthermore, using features from all levels of analysis provides inferior results in comparison to the use of FQDN blacklist features. Therefore, we choose to use only the set of features extracted by FQDN blacklist analysis for the cluster analysis of the proposed labeling approach. Table 5 presents a confusion matrix for this particular case. The presented results are far from perfect and they can be attributed to the fact that malicious and non-malicious services are often relying on the same agile DNS strategies such as Fastflux and Domain-flux. The results presented in Table 5 indicate the benefit of the manual validation step, as the manual validation significantly changes the provisional labels, thus improving the overall accuracy of the labeling process.
of data sets DS1 and DS2. In this way we try to generalize our analysis over a higher number of graph components (instances) with the intent of generalizing as much as possible the evaluation results. We have experimented with features presented in Table 2 with the objective of identifying the minimal set of features that could provide us with the most accurate clustering. We evaluated clustering results for the sets of features extracted by different levels of analysis. Fig. 3 illustrates the results of the evaluation. The performance of clustering are expressed by accuracy (ACC), false negative rate (FNR) and false positive rate (FPR) defined as follows:
ACC =
Filtered DNSMap output
TP + TN FN FP ; FNR = ; FPR = P+N FN + TP FP + TN
where P, N, FN, FP, TP, TN are numbers of positives, negatives, false positives, false negatives, true positives and true negatives, respectively. The results indicate that different analyses provide diverse performance. The highest accuracy was achieved by using the set of features extracted by FQDN blacklist analysis, where over 73% of graph components were clustered correctly. Relying on other levels of analysis leads to a less accurate clustering. The
5.3.
Results of semi-manual labeling
Using the proposed semi-manual labeling with the cluster analysis relying on features extracted by the FQDN blacklists analysis we have labeled graph components from all 5 weeks of DNS traces. The results of the semi-manual labeling on the two DNS data sets are illustrated in Fig. 4. The figure shows
Clustering performance Accuracy FNR FPR
1.0
0.93
0.8 0.73 0.65 0.61
0.61 0.60
0.59
0.6 0.54
0.54
0.44 0.42
0.43
0.4
0.56
0.47
0.46
0.37 0.27
0.2
0.16
0.15
0.13
0.08
0.0
All levels of analysis
Graph analysis
FQDNs analysis
IPs analysis
FQDNs whitelists analysis
FQDNs blacklist analysis
IPs blacklists analysis
Fig. 3 – Clustering performance when relying on features extracted by different levels of analysis.
152
computers & security 55 (2015) 142–158
Table 5 – Confusion matrix for clustering that relies on features extracted by FQDNs blacklist analysis. Actual class
Assigned to cluster
Malicious
Non-malicious
147 132 279
56 374 430
Malicious Non-malicious Total
how many graph components are marked as malicious/nonmalicious within the 5 weeks of analyzed traffic. Furthermore, the figure shows how many FQDNs and IPs are considered malicious/non-malicious when they are classified accordingly to the label of the graph component they belong to. Fig. 4a shows that the percentage of graph components labeled as malicious is fairly similar across all 5 weeks, with the percentage of malicious components ranging from 36% to 44%. On the other hand, the corresponding percentage of FQDNs/IPs labeled as malicious/non-malicious changes significantly indicating that agile graph components greatly vary in size over different weeks. The results indicate that a relatively small percentage of the agile graph components can be considered malicious whereas many non-malicious services rely on agile DNS. Finally, it should be noted that the presented labeling approach is time-efficient: in fact, for labeling 5 weeks of DNS traffic, we needed around 4 days for running automated analysis and performing manual validation, which is less than a day per week of traffic trace.
5.4.
Evaluating blacklisting practices
In order to evaluate the use of blacklists we tested all FQDNs/ IPs contained in graph components extracted by DNSMap from Subgraphs - labeling results
100
DS1 and DS2 traces against blacklists available through urlvoid.com and ipvoid.com. The results of this evaluation are expressed by Blacklist Status introduced in Section 4. Table 6 illustrates the results of the evaluation. The results indicate that a small percentage of total FQDNs/ IPs involved in agile graph components is blacklisted. For IPs the percentage ranges from 2.80% to 4.13%, while in case of FQDNs the percentage is between 6.59% and 8.46%. In addition, even the clearly suspicious FQDNs and IPs are flagged by only one-third and one-fifth of blacklists, respectively. Table 7 illustrates some of the FQDNs and IPs blacklisted by high number of blacklists. The presented FQDNs and IPs are associated with different malicious services such as malware distribution, spam, phishing, etc. Fig. 5 illustrates the distribution of Blacklist Status for blacklisted FQDNs and IPs within the trace. From the figure we can conclude that over 75% of the blacklisted FQDNs/IPs are listed by only one blacklist. This illustrates that there is often only a limited indication of “maliciousness”, and that the blacklists have different notions on what should be considered malicious, resulting in different and often non-overlapping scopes.
5.5.
Although effective in filtering out benign DNS traffic and thus reducing the analysis load, whitelisting practice can lead to information loss about the presence of malicious DNS traffic. In order to illustrate this we evaluate the use of FQDN whitelists for filtering non-malicious DNS traffic. We explore if any of the most popular domains are labeled as malicious and how many blacklists have flagged them as such, by testing them against blacklists available through urlvoid.com. We use the first n most popular domains from alexa.com to form a whitelist, where
FQDNs - labeling results
100
Malicious Non-Malicious
Malicious Non-Malicious
80
~ 62
~ 59
~78
~ 37
60
~ 38
~ 36
~78
80
~77
60
~ 57 %
~43
~ 41
40
40
~30
~ 30
~ 22
~22
20
~ 23
~ 24
20
20
~12 ~7 0
DS1
DS2-W1
DS2-W2
DS2-W3
~89
~ 76
~ 70
%
%
~ 56 ~ 44
~ 93
Malicious Non-Malicious
~ 88
~ 70 ~ 64
~ 63
IPs - labeling results
100
80
60
40
Evaluating whitelisting practice
DS2-W4
0
DS1
DS2-W1
(a) Sub-graphs
DS2-W2
DS2-W3
DS2-W4
0
DS1
(b) FQDNs
DS2-W1
DS2-W2
~11
DS2-W3
DS2-W4
(c) IPs
Fig. 4 – Results of semi-manual labeling. Results are given for data set 1 (DS1) and four separate weeks of data set 2 (DS2-Wx).
Table 6 – Evaluation of the use of FQDN/IP blacklists. Trace
DS1 DS2
FQDN blacklisting
IP blacklisting
Blacklisted FQDNs
Maximal blacklist status
Average blacklist status
FQDNs with minimal blacklist status
Blacklisted IPs
Maximal blacklist status
Average blacklist status
IPs with minimal blacklist status
8.46% 6.59%
0.24 (7/29) 0.41 (12/29)
0.0435 (≤2/29) 0.0461 (≤2/29)
78.81% 78.08%
4.13% 2.80%
0.222 (8/36) 0.19 (7/36)
0.0373(≤2/36) 0.037 (≤2/36)
75.48% 75.31%
153
computers & security 55 (2015) 142–158
Table 7 – Examples of the most blacklisted FQDNs and IPs. Trace
Blacklisted FQDNs
DS1
DS2
Blacklist status by urlvoid.com
IP
Blacklist status by ipvoid.com
thefishkaforyou.su coa.su gpt0.ru wzcom.org getapplicationmy.info appllicatiionew.com upgrade.questscantwo.com allbestnew.com
6/29 5/29 4/29 4/29 12/29 11/29 9/29 9/29
210.213.49.150 62.148.67.62 178.158.224.99 37.57.247.240 62.148.67.62 77.87.159.174 162.213.1.5 159.224.225.44
7/36 7/36 6/36 4/36 7/36 6/36 5/36 5/36
Blacklist Status
FQDNs blacklists
90
Blacklisted IPs
FQDN
IPs blacklists
90
FQDNs blacklists
90
Blacklist Status
80 78.08
80
70
70
70
60
60
60
60
50
50
50
40
30
19.23
20
10
10 2.90
0
1 29
2 29
3 29
0.25 4 29
0.13 5 29
0.13 6 29
0.13 7 29
0
% 40
30 17.65
20
2 36
3.37 0.96 0.00 0.24 0.48 0.24 3 4 5 6 8 7 36 36 36 36 36 36
40
30
30
20
20
10 1 36
75.32
50
%
%
%
80 75.48
70
40
IPs blacklists
90
80 78.81
0
15.46 4.13 0.42 0.19 0.10 0.06 1.22 0.22 0.00 0.03 0.10 1 2 3 4 5 6 7 8 9 10 11 12 29 29 29 29 29 29 29 29 29 29 29 29
(a) DS1
18.70
10 0
4.55 1 36
2 36
3 36
0.86
0.29
0.23
0.06
4 36
5 36
6 36
7 36
(b) DS2
Fig. 5 – Distribution of blacklist status for blacklisted FQDNs and IPs.
n ∈ {10, 100, … , 100, 000} . The results of the analysis are illustrated in Table 8. The results show that relatively high percentage of popular domains have been blacklisted, where in the case of the first 100 most popular domains 35% of domains are blacklisted by at least one blacklist. The percentage of popular domains being blacklisted decreases as we consider higher number of the most popular domains, but it is still significant. This can be attributed to the deficiencies of the labeling practices deployed by
blacklists and diverse context in which blacklists are defining labels. As a result, many benign and highly popular domains are blacklisted such as youtube.com or baidu.com. However, there are also numerous dubious domains often associated with malicious activity that are quite high on the list of the most popular domains, such as sj88.com and updatersoft.com. Therefore, we can conclude that even a moderate whitelisting of popular domains can cause a number of potentially malicious domains to be excluded.
Table 8 – Blacklist evaluation of the most popular domains (2-LD + TLD). Number of top alexa.com
Blacklisted
Maximal blacklist status
Average blacklist status
10
3/30.00%
0.1034 (3/29)
0.0689 (≤2/29)
100
35/35.00%
0.0689 (2/29)
0.0412 (≤2/29)
1000
196/19.60%
0.1034 (3/29)
0.0434 (≤2/29)
1500/15%
0.2759 (8/29)
0.0467 (≤2/29)
8797/8.797%
0.4138 (12/29)
0.0443 (≤2/29)
10,000
100,000
Examples of blacklisted domains (Alexa rank, blacklist status) youtube.com (3, 1/29) baidu.com (5, 3/29) qq.com (8, 2/29) adcash.com (52, 2/29) t.co (53, 2/29) odnoklassniki.ru (82, 2/29) bitly.com (273, 2/29) goo.gl (358, 2/29) sharelive.net (567, 3/29) sj88.com (3026, 6/29) updatersoft.com (4567, 6/29) cnrdn.com (7215, 8/29) winmediaplayer.com (19,307, 8/29) goggle.com (57,354, 7/29) zilliontoolkitusa.info (82,905, 12/29)
154
5.6.
computers & security 55 (2015) 142–158
Comparison of automated and semi-manual labeling
In the following we compare the performance of conventional automated labeling based on FQDN/IP blacklists and whitelisting of popular domains against the proposed semimanual labeling. We consider two automated labeling strategies, first one that solely relies on FQDN/IP blacklists and a second that uses a combination of whitelisting popular domains as pre-filtering step before using blacklists for labeling. Within the evaluation we examine the performance of labeling FQDNs/ IPs as well as labeling graph components extracted by DNSMap. The first evaluation scenario compares differences in the number of FQDNs and IPs labeled as malicious by two labeling practices. It should be noted that in the case of semimanual labeling we assume the concept of maliciousness by association, where FQDNs and IPs have the same label as the graph component they belong to. Table 9 shows the difference in labeling FQDNs/IPs when the automated approach based on the use of blacklists is compared with the proposed semimanual approach, while Fig. 6 illustrates difference in labeling FQDNs between automated labeling based on FQDN whitelisting and FQDN blacklists and the proposed approach. Fig. 6 also illustrates the influence of the number of most popular domains
Table 9 – Automated labeling of FQDNs/IPs using blacklists. Results are given for data set 1 (DS1) and four separate weeks of data set 2 (DS2-Wx). Trace
DS1 DS2
FQDNs
Week #1 Week #1 Week #2 Week #3 Week #4
IPs
FNR
FPR
FNR
FPR
0.9053 0.9033 0.9011 0.8701 0.8930
0.076952 0.056900 0.047728 0.056726 0.053559
0.9393 0.9300 0.9531 0.9303 0.9144
0.035723 0.014691 0.020759 0.025944 0.020204
Labeling of FQDNs
1.0
0.8
0.6
DS1 DS2-W1 DS2-W2 DS2-W3 DS2-W4
FNR FPR
0.4
0.2
0.0 0
10
100 1000 Number of whitelisted domains
10000
100000
Fig. 6 – Automated labeling relying on FQDN whitelists and blacklists. Results are given for data set 1 (DS1) and four separate weeks of data set 2 (DS2-Wx).
used for whitelisting on labeling performance. The results are expressed using FNR and FPR. The presented results indicate that relying solely on blacklists for labeling FQDNs/IPs would cover a relatively small percentage of the total number of FQDNs/IPs blacklisted using the semi-manual approach. As a result, performance comparison for all five weeks is characterized by high values of FNR and relatively low values of FPR. The results can be attributed to the ability of the proposed semi-manual labeling approach to recognize FQDNs and IPs that are related to each other, and assign them the same label. Low values of FPR can be attributed to the fact that the malicious graph components usually have at least some FQDNs/IPs flagged as malicious. Using the combination of whitelisting popular domains and domain blacklists, a certain number of popular FQDNs is filtered out. The number of the filtered domains is low in comparison with the total number of FQDNs within the traces, so we are observing a similar results as in the previous case. In direct comparison with the use of blacklists we can notice a small overall increase of FNR and FPR. Furthermore, for using more extensive whitelisting FNR are slightly increasing while FPR are decreasing. This can be explained by the fact that some malicious popular domains have been filtered out using the whitelisting procedure. The second evaluation scenario compares the results of labeling graph components using automated labeling approaches against the semi-manual labeling approach. In this scenario, automated labeling is implemented in such a way that the graph component is labeled as malicious if a certain number of FQDNs/IPs belonging to it are blacklisted. If domain whitelisting is used within the automated approach, then the popular domains are first filtered out and then the automated labeling based on blacklists is performed. Fig. 7 presents the results for the second evaluation scenario, by showing FPR and FNR values when the automated labeling approach is compared to the semi-manual labeling. The figure shows the relation between the two performance and the value of “decision threshold”. Decision threshold is the number of blacklisted FQDNs/IPs for which the graph component is deemed malicious. The results indicate moderate values of FPR and FNR for using different numbers of blacklisted FQDNs/IPs as a trigger for labeling graph components. By increasing the value of FQDN decision threshold FPR increases while FNR decreases. This can be attributed to the fact that increasing the threshold will exclude some malicious components that have a small number of blacklisted FQDNs as well as some non-malicious graphs with some blacklisted FQDNs. In the case of IP blacklists we are observing the same trend, while having much higher FPR than in the case of using FQDN blacklists. This can be explained due to the fact that the existence of blacklisted IPs is less indicative than the existence of blacklisted FQDNs. Fig. 8 presents the results for using both whitelisting of popular domains and blacklisting procedure for different numbers of popular domains used for whitelisting. The presented results were generated for the FQND decision threshold equal to 3, as for this value we are observing optimal values of FPR and FNR. Varying the number of popular domains used for whitelisting the value of FNR slightly increases while the
155
computers & security 55 (2015) 142–158
FNR FPR
FNR FPR
DS1 DS2-W1 DS2-W2 DS2-W3 DS2-W4
1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0
1
2
3 4 5 6 7 8 Decision threshold - number of blacklisted FQDNs
9
DS1 DS2-W1 DS2-W2 DS2-W3 DS2-W4
10
0.0
1
2
3 4 5 6 7 8 Decision threshold - number of blacklisted IPs
9
10
(b) IPs
(a) FQDNs
Fig. 7 – Labeling of graph components. Automated labeling relying on FQDN/IP blacklists. Results are given for data set 1 (DS1) and four separate weeks of data set 2 (DS2-Wx).
value of FPR slightly decreases, similar to results illustrated in Fig. 6. It should be noted that the same trend was observed when using different values of the decision threshold.
5.7.
Comparison with contemporary labeling practices
In this section we evaluate labeling practices used by some of the state-of-the-art detection methods by comparing them against the semi-manual labeling approach. The main challenge lays in the fact that the majority of existing labeling procedures cannot be reproduced as they are not disclosed in full detail or they rely on services that are not publicly available. Based on our analysis we can only replicate labeling used
1.0
FNR FPR
DS1 DS2-W1 DS2-W2 DS2-W3 DS2-W4
0.8
0.6
by Antonakakis et al. (2010) as the majority of employed tools and services are publicly available. Antonakakis et al. (2010) use labeling procedure based on FQDN/IP blacklists and FQDN whitelists to label training data sets, as well as to evaluate the detection results. Table 10 illustrates the results of comparing labeling of FQDNs and IPs done by Antonakakis et al. (2010) with the semi-manual approach. The two methods are deployed on DS1 and DS2 data sets and the results are expressed using FNR and FPR. High values of FNR indicate that the semi-manual approach has much wider coverage, while the low values of FPR show that the vast majority of FQDNs and IPs flagged by Antonakakis et al. are also flagged by our approach. This indicates that the novel semi-manual approach is covering much wider set of malicious FQDNs and IPs while still being able to encompass a vast majority of FQDNs and IPs marked as malicious by Antonakakis et al. This can be attributed to the fact that semi-manual labeling is capable of discovering relations between malicious FQDNs and IPs by employing the principles of graph analysis. Furthermore, by comparing results presented in Table 10 and Table 9 we can conclude that the evaluated labeling approach performs worse in terms of coverage then the automated approach examined in Section 5.6. This can be explained due to the fact that the automated
0.4
Table 10 – Labeling of FQDNs/IPs as proposed by Antonakakis et al. (2010). Results are given for data set 1 (DS1) and four separate weeks of data set 2 (DS2-Wx).
0.2
0.0 0
Trace 10
100 1000 Number of whitelisted domains
10000
FQDNs
100000
Fig. 8 – Labeling of graph components. Automated labeling relying on FQDN blacklists and whitelists with decision threshold equal to 3. Results are given for data set 1 (DS1) and four separate weeks of data set 2 (DS2-Wx).
8. DS1 DS2
Week #1 Week #1 Week #2 Week #3 Week #4
IPs
FNR
FPR
FNR
FPR
0.9978 0.9984 0.9993 0.9992 0.9987
0.000374 0.000528 0.000782 0.000739 0.001788
0.9951 0.9940 0.9941 0.9877 0.9832
0.001403 0.000504 0.000339 0.002383 0.000842
156
computers & security 55 (2015) 142–158
labeling employs 29 FQDN and 36 IP blacklists, which collectively have larger coverage but at the same time introduce higher number of false positives than a relatively small set of blacklists used by Antonakakis et al. (2010). Finally, as labeling practices employed by other contemporary detection approaches (Antonakakis et al., 2011; Bilge et al., 2014; Choi and Lee, 2012; Perdisci et al., 2012) are based on similar principles as labeling used by Antonakakis et al. we are confident that similar results would be achieved if they would be compared with the proposed semi-manual labeling approach.
in the fact that DNSMap is able to capture more diverse agile DNS mappings including both domain- and IP-flux. Furthermore, the two approaches use different principles of analyzing FQDNs-to-IPs mappings, i.e. Schiavoni et al. tailored their approach to identifying DGA generated domains.
6.3.
Operator’s insight
The paper addresses agile DNS traffic, as communication strategies often used by Internet criminals and one of the main targets of modern DNS-based detection approaches. We argue that non-agile/static DNS techniques are less popular within cyber criminal community due to limited flexibility and reliability they offer. We are aware that non-agile DNS can still be used for malicious purposes but we believe that it is more worthwhile to target agile DNS as many contemporary DNSbased detection techniques and take down methods are already quite effective against malicious static DNS.
The proposed semi-manual labeling defines a novel way of including operator’s insight in process of DNS labeling in order to safeguard against false positives and false negatives that are unavoidable characteristic of automated labeling approaches. We strongly believe that reliable security solutions should not exclude a human insight and that the fact that an operator can participate in the decision making is a definite advantage. However, it should be noted that the manual validation of labels would be nearly impossible if the operator would need to assess all individual FQDNs and IPs. This can be seen from Tables 3 and 4 where observed FQDNs and IPs are counted in thousands and millions. Furthermore, analyzing FQDNs and IPs individually would not indicate the agile properties of the DNS mappings thus limiting the number of information upon which the operator needs to make a decision. The operator’s involvement in labeling causes the proposed labeling approach to be more time consuming than automated solutions based on blacklists and whitelists. However, based on the experimental results within the case study, the proposed approach requires a limited human involvement that makes it a good candidate for being deployed within operational ISP networks.
6.2.
6.4.
6.
Discussion
In this section we discuss the specifics of the proposed semimanual labeling approach, the case study addressed by the paper and opportunities for future work.
6.1.
Targeting agile DNS
FQDNs-to-IPs mapping analysis
The proposed semi-manual labeling method relies on DNSMap (Berger and Gansterer, 2013) for identifying agile FQDNs-toIPs mappings. We choose DNSMap as a tunable pre-filtering approach that can be optimized to extract different forms of DNS agility. Therefore, the proposed approach is flexible and highly controllable offering the ability of precisely targeting certain agile DNS patterns. Furthermore, we believe that the employed graph perspective is much more suited for the analysis of agile properties of DNS traffic than analysis of single domains and IPs. We argue that the employed labeling by association is a great advantage as this way many FQDNs/IPs, that would not be otherwise blacklisted, are flagged due to association with rogue elements. As a result the proposed labeling approach covers a wider set of FQDNs and IPs than conventional practices, as undoubtedly shown by the evaluation results. Finally, the approach is providing a great advantage of extracting a manageable number of agile groups that can be efficiently analyzed by human operator. The analysis of FQDNs-to-IPs mappings is recognized as a valuable mean of identifying malicious agile DNS by other authors such as Schiavoni et al. (2014). The authors proposed an approach for identifying DGA domains and finding the DGA generated domains that are representative of the respective botnets. Schiavoni et al. approach is based on similar principles of tracking FQDNs-to-IPs mappings to ones used in DNSMap. The main difference between the two approaches lays
Evaluation of the proposed approach
The evaluation of the proposed approach was done using DNS data set recorded in the network of a regional ISP. We believe that the data set is comprehensive enough in order to get a reliable insight on typical patterns of malicious and legitimate traffic. However, future evaluation should be done on additional data sets recorded at different times and at different networks. Furthermore, due to smaller number of costumers traffic at the regional ISP is limited in volume in comparison to bigger ISPs. Using the proposed labeling approach on the network trace from a bigger network would result in higher number of the agile graph components extracted by DNSMap. The higher number of extracted graph components would put more strain on the operator, but as for the existing trace the operator needs less than a day to label a week of traffic, we believe that there is a space to scale the system to networks several times bigger. The paper has evaluated the performance of automated labeling strategies and the state-of-the-art labeling approaches by comparing them with the proposed semi-manual labeling approach. We assume that the ground truth obtained by semimanual labeling approach is in fact the “actual” ground truth. However, assessing the performance of the presented semimanual labeling is difficult as there is no “oracle” to determine the “true” nature of the FQDNs-to-IPs mappings. We believe that doing the labeling within the approach where the operator decision is assisted by the all the performed analysis steps, from
computers & security 55 (2015) 142–158
filtering to feature analysis and clustering, is the best that could be done. This procedure all in all makes the manual labeling fast and in the end simply doable.
6.5.
Future work
Future work will be directed at further development of the automated part of the approach in order to reduce number of errors within the provisional labeling of DNS traffic. We expect to improve cluster analysis by refining the feature set used within the clustering algorithm. In addition to the FQDN blacklist features we will consider the use of other analysis level features for clustering. Furthermore, we will consider additional strategies of assigning provisional labels that would not rely on blacklist-based assumptions. Finally, we believe that evaluating the labeling approach using additional traffic traces will provide us with a better understanding of discriminative features of agile DNS traffic, and ultimately contributing to more accurate clustering results.
7.
Conclusion
In this paper we elaborate on the ground truth problem of malicious DNS traffic and evaluate popular practices of obtaining the ground truth that rely on third party information such as domain and IP blacklists and whitelisting of popular domains. We introduce a novel semi-manual labeling practice that is targeting agile DNS as one of the main carriers of malicious DNS activity. The proposed approach is developed with the goal of obtaining the ground truth by using the operator’s insight in efficient manner. The proposed approach is compared with contemporary labeling practices on a case study of DNS traffic from a regional ISP network. The experimental results not only confirm the importance of domain/IP address blacklists and domain whitelisting for labeling of DNS traffic but also indicate that the blind reliance on them may lead to misleading conclusions about analyzed DNS traffic. Comparing the proposed semi-manual labeling approach with automated labeling approaches that rely on domain/IP blacklists and domain whitelisting, the proposed approach has shown better coverage as it discovers suspicious domains/IP addresses based on their association with other rogue domains/IP addresses. Furthermore, the automated solutions lead to a number of false positives, requiring human insight in order to safeguard against them. Finally, the proposed labeling approach has proven to incorporate the operator’s insight in time-efficient manner thus making it a viable candidate for deployment within operational ISP networks.
REFERENCES
Abuse.ch, Spyeye tracker :: Spyeye blocklist,
; 2013. Abuse.ch, Zeus tracker :: Zeus blocklist, ; 2014. Amazon Inc. Alexa. The list of the most popular domains, ; 2014.
157
Antonakakis M, Perdisci R, Dagon D, Lee W, Feamster N. Building a dynamic reputation system for DNS. In: USENIX security symposium. 2010. p. 273–90. Antonakakis M, Perdisci R, Lee W, Nikolaos V II, Dagon D. Detecting malware domains at the upper dns hierarchy. In: USENIX security symposium. 2011. Antonakakis M, Perdisci R, Nadji Y, Vasiloglou N II, Abu-Nimeh S, Lee W, et al. From throw-away traffic to bots: detecting the rise of dga-based malware. In: USENIX security symposium. 2012. p. 491–506. Aviv AJ, Haeberlen A. Challenges in experimenting with botnet detection systems. San Francisco, CA: USENIX 4th CSET Workshop; 2011. Berger A. Pydnsmap, ; 2014. Berger A, Gansterer WN. Modeling dns agility with dnsmap. In: INFOCOM. 2013. p. 3153–8. Bilge L, Sen S, Balzarotti D, Kirda E, Kruegel C. Exposure: a passive dns analysis service to detect and report malicious domains. ACM Trans Inf Syst Secur 2014;16(4):14. Choi H, Lee H. Identifying botnets by capturing group activities in dns traffic. Comput Netw 2012;56(1):20–33. Damballa Inc, Top-10 tlds abused by botnets for cnc, ; 2009. Dietrich CJ, Rossow C. Empirical research of ip blacklists. In: ISSE 2008 securing electronic business processes. Springer; 2009. p. 163–71. Holz T, Gorecki C, Rieck K, Freiling FC. Measuring and detecting fast-flux service networks. In: NDSS. 2008. IPvoid. Ip address blacklist checker tool, ; 2014. Jain AK. Data clustering: 50 years beyond k-means. Patt Recogn Lett 2010;31(8):651–66. Kay B, Greve P. Mapping the Mal Web, Tech. Rep. McAfee, Inc.; 2011 . Kheir N, Tran F, Caron P, Deschamps N. Mentor: positive dns reputation to skim-off benign domains in botnet c&c blacklists. In: ICT systems security and privacy protection. Springer; 2014. p. 1–14. Knysz M, Hu X, Shin KG. Good guys vs. bot guise: mimicry attacks against fast-flux detection systems. In: INFOCOM, 2011 proceedings IEEE, IEEE. 2011. p. 1844–52. Kührer M, Rossow C, Holz T. Paint it black: evaluating the effectiveness of malware blacklists. In: Research in attacks, intrusions and defenses. Springer; 2014. p. 1–21. Lee J, Lee H. Gmad: graph-based malware activity detection by dns traffic analysis. Comput Commun 2014;49:33–47. Levine J. DNS blacklists and whitelists, RFC 5782, RFC Editor (February). ; 2010. Ma J, Saul LK, Savage S, Voelker GM. Beyond blacklists: learning to detect malicious web sites from suspicious urls. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM; 2009. p. 1245–54. Malware domain list, List of malware related domains, ; 2014. Maxmind Inc. Databases of as numbers, ; 2014a. Maxmind Inc. Databases of countries, ; 2014b. Mockapetris P. Domain names – implementation and specification, RFC 1035, RFC Editor (November). ; 1987.
158
computers & security 55 (2015) 142–158
Openstreetmap, On-line geocoder service, ; 2014. Perdisci R, Corona I, Dagon D, Lee W. Detecting malicious flux service networks through passive analysis of recursive dns traces. In: Computer security applications conference, 2009. 2009. p. 311–20 ACSAC’09. Annual, IEEE. Perdisci R, Corona I, Giacinto G. Early detection of malicious flux networks via large-scale passive dns traffic analysis. IEEE Trans Depend Sec Comput 2012;9(5):714–26. Ramachandran A, Feamster N, Dagon D, et al. Revealing botnet membership using dnsbl counter-intelligence. Proc. 2nd USENIX steps to reducing unwanted traffic on the Internet 2006;49–54. Schiavoni S, Maggi F, Cavallaro L, Zanero S. Phoenix: dga-based botnet tracking and intelligence. In: Detection of intrusions and malware, and vulnerability assessment. Springer; 2014. p. 192–211. Sheng S, Wardman B, Warner G, Cranor L, Hong J, Zhang C. An empirical analysis of phishing blacklists. In: Sixth conference on email and anti-spam (CEAS). 2009. Silva SS, Silva RM, Pinto RC, Salles RM. Botnets: a survey. Comput Netw 2013;57(2):378–403. Sinha S, Bailey M, Jahanian F. Shades of grey: on the effectiveness of reputation-based blacklists. In: Malicious and unwanted software, 2008. 2008. p. 57–64 MALWARE 2008. 3rd international conference on, IEEE. Sommer R, Paxson V. Outside the closed world: on using machine learning for network intrusion detection, In: security and privacy (SP), 2010 IEEE symposium on, IEEE, pp. 305–316, 2010. The Spamhaus Project Ltd, The Spamhouse Project: spamhaus dnsbl, ; 2014. Unspam Technologies, Inc., Project Honey Pot-directory of malicious ips, ; 2014. Urlvoid. Website reputation checker tool, ; 2014. Verisign. The domain name industry brief, Tech. Rep. VeriSign, Inc; 2014 . Villamarín-Salomón R, Brustoloni JC. Identifying botnets using anomaly detection techniques applied to dns traffic. In: Consumer communications and networking conference, 2008. 2008. p. 476–81 CCNC 2008. 5th IEEE, IEEE. Wot Services Ltd, Web of trust – reputation system, ; 2014. Yadav S, Reddy AKK, Reddy A, Ranjan S. Detecting algorithmically generated malicious domain names. In: Proceedings of the 10th ACM SIGCOMM conference on Internet measurement. ACM; 2010. p. 48–61.
Yan P. Opendns security labs, how likely is a domain to be malicious? here’s a look at the stats and graphs that help us decide, ; 2013. Matija Stevanovic received the M.Sc. in Electrical Engineering in 2011, from the Faculty of Electrical Engineering, Belgrade University, specializing in system engineering. He is currently a Ph.D. Student in the Wireless Communication Section, Department of Electronic Systems, Aalborg University. His research interests include network security, traffic anomaly detection and botnet detection based on network traffic analysis. Jens Myrup Pedersen received the M.Sc. in Mathematics and Computer Science in 2002, and the Ph.D. in Electrical Engineering in 2005 from Aalborg University, Denmark. He is currently Associate Professor at the Wireless Communication Section, Department of Electronic Systems, Aalborg University. His research interests include network planning, traffic monitoring, and network security. He is author/co-author of more than 70 publications in international conferences and journals, and has participated in Danish, Nordic and European funded research projects. He is also board member of a number of companies within technology and innovation. Alessandro D’Alconzo received the M.Sc. degree in Electronic Engineering with honors in 2003, and the Ph.D. in Information and Telecommunication Engineering in 2007, from Polytechnic of Bari, Italy. Since 2007 he is Senior Researcher in the Communication Networks Area of FTW. His current research interests embrace the network measurements and traffic monitoring area, ranging from design and implementation of statistical based anomaly detection and automatic diagnosis, Quality of Experience evaluation, and application of secure multiparty computation techniques to interdomain network monitoring. Stefan Ruehrup received his graduate degree in 2002 and his Ph.D. in Computer Science in 2006 from University of Paderborn, Germany. He was postdoctoral researcher at University of Ottawa, Canada, and held lecturer and management positions in Germany. He is senior researcher in the Communication Networks area at Telecommunications Research Center Vienna (FTW), Austria. His research interests are analysis and simulation of networks and communication protocols in mobile communications and ITS. Andreas Berger received B.Sc. and M.Sc. degrees in Information and Communication Technology from the Technical University Graz, Austria and a Ph.D. degree in Computer Science from the University of Vienna, Austria. His research interests include traffic monitoring and data analysis, with a focus on malware detection.