Journal of Network and Computer Applications 58 (2015) 144–154
Contents lists available at ScienceDirect
Journal of Network and Computer Applications journal homepage: www.elsevier.com/locate/jnca
BotFlex: A community-driven tool for botnet detection$ Sheharbano Khattak a,n, Zaafar Ahmed b,c, Affan A. Syed b,c, Syed Ali Khayam c a b c
The Computer Laboratory, University of Cambridge, United Kingdom SysNet, National University of Computer & Emerging Sciences, Pakistan PLUMgrid Inc., 830 E. Arques Ave, Sunnyvale, CA 94085, USA
art ic l e i nf o
a b s t r a c t
Article history: Received 18 February 2014 Received in revised form 24 August 2015 Accepted 6 October 2015 Available online 17 October 2015
Botnets currently pose the most potent threat to the security and integrity of networked systems. In this paper, we present our experiences of designing, implementing and evaluating BotFlex, which (to the best of our knowledge) is the first open-source network-based tool for botnet detection. BotFlex is designed to support extensibility (in detection parameters and decision elements), flexibility (in configuration), an easy-to-use interface, and real-time operation. While the tool is designed for extension and improvement by community inputs, we report very encouraging accuracy and performance results of our first-cut BotFlex implementation. On a 500 GB trace captured at an ISP with ground truth provided by a commercial security company, BotFlex provides TPR and FPR of 94.4% and 6.6%, respectively – comparable with our baseline state-of-the-art BotHunter tool (TPR: 79.6%, FPR: 6.6%). In addition to accuracy, we observe that BotFlex incurs negligible detection delay, while having good throughput (47 K packets/ second) and low processing overhead. & 2015 Elsevier Ltd. All rights reserved.
Keywords: Botnet Network security Correlation
1. Introduction In view of the gravity of the botnet threat, the academic community has produced scores of research papers and reports to explain botnet behavior, topologies, sizes, detection parameters, defense strategies, and future trends. However, to date no community-driven collaborative tool exists that can be used to easily implement and test new botnet detection mechanisms. As a result, a novice researcher has no option but to implement existing literature from scratch even to tweak a single botnet detection parameter. With this significant barrier to entry, the most impactful botnet research remains largely confined to security labs and companies, without much comparison with other tools or. techniques. Such a tool offers the additional advantage of channelling time and resources towards improving the state-of-the-art, rather than reproducing what already exists in prior literature.1 ☆ Appeared as a poster in ACM International Conference on Computer and Communication Security (CCS) (Khattak et al., November 2013) n Corresponding author. E-mail addresses:
[email protected] (S. Khattak),
[email protected] (Z. Ahmed),
[email protected] (A.A. Syed),
[email protected] (S.A. Khayam). 1 A pertinent example to substantiate the positive impact of an effective tool is BotHunter (Gu et al., 2007), the only freely available bot detection tool. While BotHunter has been widely-used by security researchers (500þ citations), we argue that building an open-source tool will help further accelerate botnet R&D.
http://dx.doi.org/10.1016/j.jnca.2015.10.002 1084-8045/& 2015 Elsevier Ltd. All rights reserved.
To bridge the above gap, this paper contributes BotFlex2 – a community-driven network-based tool for botnet detection. BotFlex has been designed to be (i) domain specific and communitydriven with a view to conveniently develop, improve upon, and/or benchmark existing and new botnet detection solutions, (ii) flexible in fine tuning its detection thresholds and conditions to cater to varying organizational/deployment accuracy and delay requirements; and extensible in easy integration of new detection parameters and decision elements to keep up with the rapidlyevolving botnet threat, (iii) promptly process information for early detection of threats and subsequently activate evasive countermeasures, with a (iv) simple user interface to define botnet detection policies and thus improve end-user productivity. We build BotFlex over the Bro intrusion detection system (IDS) (Paxson, 1998). BotFlex derives symptoms of botnet infection from events fed to it by Bro and correlates them to make the actual diagnosis. We highlight that BotFlex is a tool meant to facilitate the community to conveniently extend and update detection strategies in response to the Botnet arms race. For the purpose of evaluation, we initialize BotFlex with detection parameters from existing literature but these should not be deemed exhaustive. Our second contribution is BotFlex's decision module, which marks the first time a CEP engine has been built within a nonproprietary NIDS. We conceptualize botnet detection as a complex event which can be detected via multiple trigger paths. To this 2
Full source code available at: 〈http://www.sysnet.org.pk/BotFlex〉.
S. Khattak et al. / Journal of Network and Computer Applications 58 (2015) 144–154
end, we build BotFlex's decision module as a Complex Event Processing (CEP) engine (called the correlation framework), programmed by a custom rule language for Bro (Paxson, 1998).3 To tackle today's complex, multi-stage attacks, NIDSs are moving towards integration of external intelligence with direct analysis of traffic on the wire (Amann et al., 2012). Our correlation framework marks the next step in this direction by equipping NIDS to readily understand and correlate events derived from disjoint sources. Furthermore, a CEP engine local to an IDS can directly ingest NIDS events and data structures, thus avoiding the overhead that external CEP engines inevitably incur as NIDS events are translated to a perceivable format. Though built with botnets in mind, our CEP engine is generic and modular so that it can be used to model any other network phenomena with distributed events. Both BotFlex and its correlation framework are released in open-source for community use (BotFlex, 2013). Finally, we evaluate BotFlex for accuracy and performance over 500 GB enterprise traffic collected from one of Pakistan's largest ISPs, Nayatel (Nayatel, 2013), with ground truth obtained using a one-time sample of Team Cymru's botnet reputation list for the collection duration. We initialize BotFlex with different signature, intelligence and behavior based detection parameters with a view to observe the efficacy of different combinations of these parameters in botnet detection. By tuning different detection parameters and correlation policies made possible by the flexible nature of the tool, we achieve a TPR of 94.4% with an FPR of 6.6%. To baseline our accuracy results, we also run BotHunter (Gu et al., 2007) on the same dataset and observe TPR and FPR of 79.6% and 6.6%, respectively – hence validating our implementation against the only freely available tool. Performance evaluation of BotFlex reveals consistent CPU usage at par with Bro, but the following remain areas of improvement for future work: (i) it increases the time it takes for the underlying NIDS (Bro) to process 500 GB network trace (replayed in real-time traffic emulation mode) by 30 min, (ii) it reduces Bro's throughput by 4 K packets/second, and (iii) it utilizes nearly twice as much memory as Bro. Despite encouraging preliminary results, we acknowledge that BotFlex is still in its infancy and hope that its built-in design flexibility and extensibility will allow the community to extend it into an effective and actively updated tool.
2. Related work We divide related work into two main areas: (i) tools, and (ii) mechanisms for botnet detection. Tools for botnet detection: Most intrusion detection systems (e.g. Snort (Roesch, 1999), Bro (Paxson, 1998), Ourmon (Binkley and Massey, 2005), Scap (Papadogiannakis et al., 2013)) provide some support for botnet detection through signature- and anomalybased methods. However, their general focus is too broad to classify them as dedicated botnet detection tools. The only network-based botnet detection tool that is also freely available is BotHunter (Gu et al., 2007). BotHunter maps communication exchanges between a local host and the Internet as potential steps in the lifecycle of a botnet (comprising inbound scanning, exploit usage, egg downloading, outbound bot coordination dialog, and outbound attack propagation). Lifecycle alerts are generated by a customized version of Snort while these are correlated by a separate dialog correlation engine according to predefined rules. 3 CEP (Luckham, 2001) is an emerging domain that continuously processes low-level events from distributed sources in real-time to derive high-level information. Thus complex knowledge can be extracted as soon as relevant information becomes available without the need for persistent storage.
145
BotHunter does not provision for tuning its correlation parameters (alert weights, observation intervals, correlation rules). Finally, to the best of our knowledge, none of the public tools is capable of both vertical (multiple features and lifecycle events) and horizontal (across different hosts) correlation. Mechanisms of botnet detection: A number of mechanisms have been proposed for botnet detection. We briefly cover detection mechanisms that directly relate to the present work. Signaturebased approaches (Haq et al., 2014; Shin et al., 2013; Zand et al., 2014; Goebel and Holz, 2007; Roesch, 1999; Bilge et al., 2012a; Burghouwt et al., 2013; Yan, 2013) identify botnets by comparison with known patterns of botnet C&C communication extracted from observed samples. Correlation-based mechanisms detect botnets by attributing different patterns in network activity to botnets. Correlation can be further classified as horizontal or vertical. Horizontal correlation (Chen et al., 2013; Gu et al., 2008a, 2008b; Wang and Yu, 2009; Strayer et al., 2006, 2008; Yen and Reiter, 2008) captures the fact that botnets are coordinated malware, and therefore exhibit similarity in behavior and/or communication across different infected hosts. In vertical correlation (Yen et al., 2013; Burghouwt et al., 2013; Bilge et al., 2012a; Abu Rajab et al., 2006; Gu et al., 2007; Liu et al., 2008; Shin et al., 2013; Chen et al., 2013), the behavior of individual hosts over an observation interval is mapped to an established model of bot behavior. Top-down methods (Zhuang et al., 2008; Ramachandran et al., 2006; Villamarin-Salomon et al., 2008; Choi et al., 2007) associate coordinated malicious behavior or its side effects observable at higher network elements with botnets. Active techniques (Gu et al., 2010) manipulate flows or inject traffic to elicit responses from bots that are likely to give them away.
3. BotFlex architecture In this section, we highlight design goals of BotFlex followed by a discussion on the system architecture. Finally, we explain how BotFlex integrates with an existing open-source NIDS, Bro. 3.1. Design goals Before describing the architectural elements of BotFlex, we outline high-level design goals that we expect BotFlex to meet. We refer to these goals later while describing architecture (Section 4.3.1. and implementation (Section 4) of BotFlex. Goal 1 – Community-driven and domain specific: The botnet research community currently lacks an open-source and community-driven tool. We want BotFlex to be used by the community as the platform to develop with ease, improve upon, and/ or benchmark existing and new botnet detection solutions. Goal 2 – Flexible and extensible: As botnets represent a rapidlyevolving threat, BotFlex must be extensible to allow easy integration of new detection parameters and decision elements to sustain its relevance for future threats and defenses. Furthermore, with different organizations and deployment scenarios having varying accuracy and delay requirements, the tool should allow flexibility in fine tuning its detection thresholds and conditions. Goal 3 – Simple user interface: BotFlex should provide a simple user interface for definition of botnet detection policies to improve productivity of the end-user. Goal 4 – Timely information handling: The tool should promptly handle network information as and when it becomes available for early detection of threats. This provides the defender the opportunity to respond to incidents in a timely fashion.
146
S. Khattak et al. / Journal of Network and Computer Applications 58 (2015) 144–154
Fig. 1. BotFlex architecture.
Fig. 3. Bot lifecycle.
Fig. 2. Architecture of BotFlex with respect to Bro.
3.2. Architecture BotFlex's architecture (Fig. 1) comprises of three modules: the blacklist manager, the sensor module and the correlation framework. The last two of these deal with symptoms and diagnosis of botnet infection, respectively. Decoupling these helps us make the tool more extensible (Goal 2) as (i) botnet infection symptoms can be independently developed/modified without regard to how they are correlated, and (ii) Multiple correlation policies can share the same symptom(s), yet handle them differently. Blacklist manager: The blacklist manager complements the sensor module in its operation by providing it with up-to-date intelligence, such as C&C and exploit blacklists. Sensor module: The sensor module generates symptoms of botnet infection as events derived from network data. Using an event-driven model for the sensor module allows for information to be processed as it is churned, thus facilitating timely reaction (Goal 4). Events produced by sensor module may be readily consumed (simple events) or derived by further processing (derived events). Derived events can optionally involve one or more iterations through the correlation framework. Based on the wellknown (Abu Rajab et al., 2006; Gu et al., 2007) bot lifecycle events (Fig. 3), the sensor module eventually maps all simple and derived events to five high-level activity classes: inbound scan, host exploit, malicious binary (egg) download, C&C communication and attack. Correlation framework: Botnet is a complex malware which can manifest itself through a number of malicious activities with varying degrees of observability. The correlation framework continuously receives events from the sensor module and correlates them according to rule(s) specified by the user to derive the
complex event of botnet infection. Note that we input botnetrelated events to the framework, but in principle it is an independent, self-contained entity capable of processing any events fed to it. It is possible to turn off the correlation engine by commenting out a single line in the configuration file. After surveying different techniques, we implemented the correlation framework as a Complex Event Processing (CEP) engine (Luckham, 2001). CEP is an emerging concept that allows correlation of events in realtime to detect a target complex event comprising of multiple simple or complex events. CEP makes BotFlex (i) flexible in how sensor module alerts are correlated (Goal 2); and (ii) extensible in facilitating addition of new correlation conditions (Goal 2); (iii) with a faster response because of a temporally-aware, eventdriven model where information is processed as soon as it is churned and discarded when it is no longer relevant/needed (Goal 4). The present problem of botnet detection clearly conforms to the CEP model where events are defined with respect to time, causality and aggregation (Luckham, 2001). Hence we treat botnet infection as a complex event which is deduced by correlating different trails of evidence gathered by the sensor module. 3.3. NIDS integration The sensor module operates on events fed to it by an underlying NIDS (Fig. 1). We decided to build BotFlex on top of Bro (Fig. 2) as Bro's representation of network information as events helps us process information in a timely manner (Goal 4). Furthermore, Bro makes it possible to implement custom policies (botnet detection in our case) through a domain-specific scripting language, thus facilitating extension and usability (Goals 2 and 3). We now briefly discuss Bro's architecture and then how BotFlex fits into it. Bro comprises of three layers. Network, Event and Scripting (Fig. 2). The network layer uses libpcap to capture network traffic. The event engine uses the received packets to create data structures and events, which can be captured and manipulated at the top layer (Bro script interpreter) through a Turing-complete, domain-specific scripting language. BotFlex resides at this layer
S. Khattak et al. / Journal of Network and Computer Applications 58 (2015) 144–154
and both its sensor module and the correlation framework have been written entirely in Bro's scripting language. The correlation framework is novel in that it marks the first time a CEP engine has been built within a non-proprietary NIDS with a view to accelerate information processing (Goal 4). This approach is orthogonal to how NIDS-CEP interaction is currently perceived (i.e. CEP engine is an external entity to which NIDS alerts are fed). A CEP engine local to an IDS has the advantage of being capable of directly ingesting NIDS events and data structures. Time is a critical NIDS asset and an external CEP engine inevitably incurs some overhead as NIDS events are translated to an intermediate format that can be understood by the engine. This phenomenon, which we call the translation overhead, becomes more pronounced for high churn low-level events such as ‘a new TCP connection’. Furthermore, NIDSs are looking into integrating external intelligence with direct analysis of traffic on the wire (Amann et al., 2012) as today's complex, multi-stage attacks need more context. Our correlation framework marks the next step in this direction by equipping NIDS to readily understand and correlate events derived from disjoint sources.
4. BotFlex implementation In this section, we provide details of our implementation of BotFlex's architectural elements, followed by description and examples of its user interface. 4.1. Blacklist manager BotFlex uses a custom blacklist manager to pull intelligence of interest. Our blacklist manager is a bash script that downloads a number of public blacklists (BotFlex, 2013), organizes the intelligence based on its subject (IP, URL, subnet, port) and normalizes it according to a specific format. The script needs to be regularly invoked, for example through cron, so that the intelligence is upto-date. When BotFlex starts, it uses Bro's input framework (Amann et al., 2012) to read the blacklists into a hash table for fast querying. This table is synchronized with the blacklists and is automatically updated whenever the source blacklists are modified. The blacklist manager is flexible (Goal 2) in that it can be replaced with any other service as long as the blacklists adhere to the file format used by BotFlex. 4.2. Sensor module The sensor module generates possible evidences of botnet infection. We compiled a preliminary list of detection parameters from existing literature that the sensor module associates with botnet behavior (summarized in Table 1). The sensor module is extremely flexible (Goal 2) as it uses tunable attributes. The threshold values used to trigger various events, their weights and observation intervals (or time windows) are configurable. 4.3. Correlation framework In this section, we first describe the architecture and functional model of the correlation framework (Section 4.3.1). Next we discuss some framework features which make it comply with BotFlex's underlying design goals. 4.3.1. Architecture and functional model We describe a typical correlation cycle, discussing various framework components along the way (Fig. 4). The correlation cycle is marked by two distinct phases; capture & feed and store &
147
Table 1 Detection parameters used by BotFlex Sensor Module. Phase
Detection parameters
Inbound Scan Exploit
IP Sweep High failed connection rate (Paxson, Portscan 1998) Match with blacklists of hosts involved in push and pull style exploits Egg download Executable file downloaded that (i) has a small size, (ii) unusual extension (Khan et al., May 2011), or (iii) a match in Malware Hash Registry (Bro, 2013b) C&C Too many DNS NXDOMAIN (Yadav and Reddy, Sep. 2011), C&C or RBN blacklist match Attack Outbound Spam Too many SMTP connections, Too many DNS MX queries (CBL, 2013) Outbound IP Sweep High failed connection rate (Paxson, Outbound Portscan 1998)
match. For the sake of clarity, we use the following running example to illustrate the role of different components. Example 1. Generate an alert if a host talks to a host in Dystopia and then downloads a file within a 5 min time period. Capture & Feed: A user-defined event handler captures the interesting event and converts it to a stream, a format that can be readily understood by the framework. Mapping this to Example 1, both the events ‘host talked to host in Dystopia’ and ‘host downloaded a file’ need to be converted to stream format. Next, the stream is passed on to the correlation framework along with the ID of the correlation item and the index to which the stream corresponds. The correlation item is a complete unit of operation for the correlation framework. It comprises of a unique correlation ID (e.g. dystopia_bots) and one or more associated filters. A filter has two basic parts; (i) a rule that describes how to correlate events (e.g. the event ‘host H talks to a host in Dystopia’ followed by ‘host H downloads a file, all within M minutes’), and (ii) an action that specifies the steps to execute when the correlation rule yields true (‘generate alert’ in Example 1). An index is the subject of a stream, e.g. an IP address, a subnet or any arbitrary string (e.g. URL, certificate, username). The framework stores and organizes information about the streams it sees according to the correlation item and index. Store & Match: The correlator receives incoming streams and updates the corresponding history (information about previously received streams). Next, a check is performed to see if the correlation rule is satisfied, in which case the action part is invoked and the event is optionally logged. The log contains information about when the correlation started, when the rule was satisfied, correlation item, index and the history accumulated. Alternately, the framework waits for more streams until the time window expires (5 min in Example 1). A time window specifies how long to keep track of previous A new time window is initiated every time a rule is satisfied or a previous window expires. In both cases, all accumulated history is purged. 4.3.2. Correlation framework and botflex design goals Being the core decision element, the correlation framework has a number of features that help BotFlex meet its design goals. Flexibility: Correlation framework features event prioritization (i.e. manipulating events at different levels of sensitivity) instead of treating them as binary values (happened/did not happen). Users can assign a weight to a stream before it enters the CF and manipulate its value in the correlation rule. Another interesting feature of the framework is multiresolution analysis, as it can present output events (history) at a different resolution than the one with which it processed them at input.
148
S. Khattak et al. / Journal of Network and Computer Applications 58 (2015) 144–154
Fig. 4. Implementation of correlation framework.
commands with a pre-programmed response, such as sending spam, performing outbound scan, launching DDoS. A possible horizontal flavor of Example 1 is
This is realized by allowing users to assign an alternate name to a stream. While a stream's name is used to match it with the correlation rule, the alternate name is used in the output history when the correlation rule is satisfied. For example, the framework may process outbound bot attack activities at a higher granularity (e.g. DDoS, Spam, Scan) to flexibly handle individual activity types while the output history can use the broader term ‘attack’. Such higher-to-lower resolution mapping helps in identifying generic trends, zooming out from instantial details. The reverse case of lower-to-higher resolution mapping is also possible, where more generalized correlation policies can be devised with details reflected in the output. Extensibility: Correlation framework supports multiperspective analysis as it allows the user to define multiple correlation policies for the same correlation item. Essentially, the correlation item (e.g. IRC_botnets) acts as an umbrella covering a number of correlation policies differing in triggering rules and accompanying action. We found that a sample correlation filter based on sensor module stimulants (Section 4.2) takes 103 lines of code (LOC). We implemented the same single correlation filter without our correlation framework and LOC nearly doubled (209 lines). Adding 24 more filters to the single correlation filter resulted in the code being increased by 403 lines only. Domain specificity: Correlation framework features multidimensional correlation to cater for correlation styles commonly used by the botnet domain. A number of existing botnet detection
BotFlex exposes a very simple interface to the users with a view to facilitate tool extension and refinement (Goal 3). BotFlex is completely integrated into the domain-specific scripting language of Bro. We now walk through the high-level interface using Example 1 (Section 4.3.1). First, we need to add the filter and corresponding correlation item to the correlation framework. We first define a function that will be called when correlation rule yields true. Next, we define a correlation rule that only has an
mechanisms are broadly based on horizontal and vertical correlation (Section 2). Vertical correlation performs temporal analysis of events generated for a single entity. Example 1 (Section 4.3.1) illustrates vertical correlation. In contrast, horizontal correlation deals with spatial analysis of an event pattern for multiple entities. The latter is common in centralized botnets as bots respond to C&C
order part that matches the event pattern connection to Dystopia (CONN_DYSTOPIA. followed by file download (FILE_DOWNLOAD) (the complete description of our rule language can be found on our project website (BotFlex, 2013)). We then add the correlation item to the framework, specifying its ID (test_corr) and a filter record (same as C's struct data type). To put things together, the
Example 2. Generate an alert if 20 hosts are observed where each host talks to a host in Dystopia and then downloads a file within a 5 min time period. Simple user interface: We have implemented a declarative correlation rule language that allows processing streams according to order and/or an expression. Event order can optionally include event cardinality, e.g. two instances of event X followed by three instances of event Y. Expression-based rules make it possible to process events according to statistically derived models (e.g. regression model (Gu et al., 2007)). In addition to event cardinality, these rules can take event sensitivity (weight) into account. 4.4. Interface
S. Khattak et al. / Journal of Network and Computer Applications 58 (2015) 144–154
filter tells the framework a time window to accumulate event history, the correlation rule, the function to call when correlation rule is met, filter type (horizontal/vertical), to log correlation result and use alternate stream names in history that is output to the user. Correlation framework is now ready to receive streams for this correlation as follows:
Bro generates the event connection_established when two hosts establish a TCP connection through completion of three-way handshake. We then extract information about connection source and destination IP addresses from the connection record passed on in event parameters. We use a condition to filter out only those connections where destination IP was located in Dystopia. The remaining lines deal with interfacing the event to the correlation framework. Next we populate a stream record with its name and weight. Optionally, we can also specify a value for $alt_name, which will be used in history output to the user instead of the stream's $name. The procedure to input the stream FILE_DOWNLOAD to the correlation framework is exactly the same. To implement Example 2, a horizontal correlation rule, only two changes need to be made to Correlation::add_correlation_item. First, $filter_type would change to horizontal, and an additional line $horizontal_threshold ¼20 would be added.
5. Evaluating BotFlex We now evaluate BotFlex for its accuracy, detection delay, and performance. We would like to reiterate that while these results provide a touchstone to evaluate BotFlex, the real strength of the work lies in the ability to extend botnet detection capability through its open-source nature and user-friendly interface.
149
globally-deployed sensors. We were provided with an hourly updated list of C&C servers (referred to as the ground truth blacklist henceforth) that were being actively contacted by hosts from the monitored region during our data collection at the ISP's B-RAS (between 08:54:49 AM and 04:13:29 PM PKT on September 18th, 2012). We labeled our data on the principle that the hosts in our dataset that communicate with the ground truth C&C servers are bots. We acknowledge that this blacklist of known C&C servers is biased in favor of TP (sensitivity) at the expense of TN (specificity), as the company claims to have nearly zero false-positives from their IP reputation feed. The ground truth identified 108 (28.3%) of the total 381 IP addresses in the data trace as compromised bots. To evaluate performance, tests were conducted on a core i5 machine (note: bro is single-threaded), @ 3.30 Ghz with 4 GB RAM, 7200 rpm hard disk, and 3 MB of L3 cache using top and oprof. To replicate operation on live traffic, we used Bro in its pseudo-realtime mode (Bro, 2013b). This mode imitates real-time behavior by injecting artificial delays into packet processing based on timestamp differences between successive packets. 5.2. Accuracy evaluation We take a bottom-up approach in evaluating BotFlex in terms of accuracy. Thus, we first identify the best thresholds for the sensor module, and then compare different correlation rules to identify the most effective bot detection system. We run BotHunter (Gu et al., 2007) on the same dataset as a baseline to validate our results. We treat BotHunter as a blackbox running with default configuration. BotHunter uses a customized version of Snort (Roesch, 1999) to analyze network traffic exchanged between a local host and the Internet to detect various malicious activities comprising inbound scanning, exploit usage, egg downloading, outbound bot coordination dialog, and outbound attack propagation. A separate JAR file called dialog correlation engine uses predefined rules to continuously correlate alerts from the previous step to detect potential botnet infection. BotHunter is not an open source software and hence cannot be extended by third parties. Also, it does not provision for tuning its correlation parameters (alert weights, observation intervals, correlation rules). For fair evaluation we do not provide the ground truth blacklist to either of BotHunter or BotFlex. Our evaluation methodology might seem tailored to data, however, it is a practical approach as various parameters in security tools need to be adjusted by network administrators according to their requirements. We reiterate that the important aspect here is that BotFlex allows easy reconfiguration of its operational and decision settings.
5.1. Evaluation dataset and methodology We evaluate BotFlex over a 500 GB data trace obtained from one of Pakistan's leading ISPs, Nayatel (Nayatel, 2013). This data was collected at a link with an average data rate of E 28 K packets per second, and with 80/tcp being the most common port, followed by 443/tcp. The traffic contained 48,606 unique IP addresses of which 381 IP addresses represented Nayatel's local network statically assigned to mid-to-small size enterprises. While we assume that an IP address represents a single machine, we acknowledge that the ISP customers are likely using their public IPs to represent multiple private (NATed) hosts. However, this assumption of a one-to-one mapping between public IPs and bots provides a realistic test environment for deployment of a realworld bot detection solution. We obtained ground truth for this data from Team Cymru, a commercial company that specializes in security products and research. The company maintains a proprietary threat repository of known C&C servers identified with the help of a chain of
5.2.1. Sensor module – determining suitable thresholds We observe the impact of different threshold values for threshold-based detection parameters in the sensor module (Section 4.2) on bot detection. We then use best ROC operating points to identify suitable thresholds. This is important because botnet-centric correlation in the next step is based on the soundness of sensor module results. We acknowledge that the use of a bot-labeled ground truth to measure accuracy of specific sensors (spam, scan, etc.) is not perfect; however it remains a more rigorous approach than setting arbitrary thresholds. 5.2.2. Correlation framework – determining a suitable correlation rule After establishing suitable settings for the sensor module, we formulate correlation rule(s) for botnet detection. Our objective is to identify the strengths and weaknesses of various correlation policies. We devise a number of possible rules (Table 2) and plot the result of each run on an ROC curve. We acknowledge that the
150
S. Khattak et al. / Journal of Network and Computer Applications 58 (2015) 144–154
Table 2 Description of BotFlex correlation rules based on sensor events presented in Table 1. Namea Rule 1
Description
(cnc_blacklist OR cnc_other) AND (exploit OR egg OR attack)
C&C communication and any other activity related to host exploit, malicious binary download or attacking other hosts cnc_blacklist OR (cnc_other AND (exploit OR egg OR attack)) Same as 1, with a direct trigger whenever there is a C&C blacklist match (exploit OR egg) AND (cnc_blacklist OR cnc_other OR attack) Evidence of any inbound host compromise and outbound C&C communication or attacking other hosts cnc_blacklist OR ((exploit OR egg) AND (cnc_other OR attack))) Same as 3, with a direct trigger whenever there is C&C blacklist match (egg AND (cnc_blacklist OR cnc_other)) OR (attack AND (cnc_blacklist OR Download of malicious binary and outbound C&C or attacking other hosts, or C&C cnc_other)) OR (egg AND ATTACK) communication and attacking other hosts cnc_blacklist OR (egg AND cnc_other) OR (attack AND cnc_other) OR Same as 5, with a direct trigger whenever there is C&C blacklist match (egg AND attack)
2 3 4 5 6
a
Rule{1,…,6}b variations include an OR rule for hosts tagged as part of a spam or scan campaign through horizontal correlation.
BotHunter
BotFlex
True Positive
1 Hosts in which present (%)
True Positive Rate
Rule-3b,Rule-4b Rule-3,Rule--4
0.9 Rule-5b,Rule-6b Rule-5,Rule-6 0.85
60 40 20 0
s
uerie
can nd S oou Outb ns ectio onn PC SMT
s
0.1
ed Q
0.09
uerie
0.08
Fail
0.07
False Positive Rate
Q MX
0.06
DNS
Rule-1b,Rule-2b Rule-2,Rule-1
0.05
80
klist blac RBN klist blac C&C atch po M h Re Has are Malw ad wnlo e Do ll Ex Sma e n Ex nsio Exte Bad klist blac loit Exp ive Pass st ckli it bla xplo ve E Acti
0.75 0.04
False Negative
100
0.95
0.8
False Positive
Detection parameters in BotFlex’s sensor module Fig. 5. ROC curve for botnet detection with BotFlex and BotHunter. A description of rules can be found in Table 2. Where multiple rules fall on the same data point, we separate them by commas.
current list of rules is not exhaustive and expect contribution to the rule base as BotFlex is used for botnet detection. Also, note that we removed the detection parameter ‘inbound scan’ from our rules after noticing that it is nearly omnipresent (triggered for 341 of our total 381 ISP local hosts) possibly because of legitimate applications (e.g. p2p peer discovery) and Internet background noise. We now discuss insights gathered from the ROC curve for correlation rules (Fig. 5). Vertical correlation rules indicate that relying heavily on evidence of C&C communication produces low detection rates. This can be improved by complementing C&C evidence with other detection parameters. Since Rule 3 achieves the best TP-FP balance by requiring evidence of any inbound host compromise and outbound C&C or attack, we use it as a reference point for all subsequent discussion. For rules Rule{1,2,…,6} b involving horizontal correlation, we detect 23 hosts through coordination in spam-like activities and another 17 are detected based on synchronization in outbound scan timings. We generally find that horizontal correlation increases TP with no effect on FP with the exception of one host in Rule 3b and Rule 4b (the latter was found to be a spambot later through manual investigation). Hence horizontal correlation can possibly identify previously undetected bots based on their activity coordination with other botnet members. The above results correspond to a window size of 9 h for correlating events related to botnet infection. We want to see how changing the window size affects BotFlex's detection rate. For this purpose, we repeat the experiment with the same rules (Table 2) but window sizes of 1 h and 3 h, respectively. We find that window size has a small effect on FP of different rules (ranging between 12 and 18 for both the window sizes). However, TP is
Fig. 6. Distribution of detection parameters in BotFlex's sensor module across its true positives, false positives and false negatives.
reduced for smaller window sizes (lowest 70 for 1 h window and 84 for 3 h window). To sum up, BotFlex (with Rule 3) detects 102 of the ground truth 108 bots, with a TPR of 94.4% and an FPR of 6.6%. These results are comparable with our baseline tool, BotHunter, which on the same trace gave a TPR of 79.6% and an FPR of 6.6%. 5.2.3. The impact of sensor module events on botnet detection Now we quantify the impact of different detection parameters employed in BotFlex's sensor module (Section 4.2) on its TP, FP and FN rates. To this end, we investigate how much each detection parameter contributes to the detection process. Figure 6 shows this contribution where, for each detection parameter, we list the percentage of hosts in BotFlex's original TP/FP/FN (using Rule 3) that had this parameter present. We observe that our FP are caused by outbound scan alerts for hosts using p2p clients, incorrect egg download alerts for small software updates with uncommon extensions (e.g. bundle). As for passive exploit blacklists (web exploits), their alert profusion can mean two things: either the blacklists are unreliable or exploitserving websites are fairly prevalent (also suggested by existing studies (Provos et al., 2007)). FP hosts with high number of SMTP connections and DNS MX queries were found to be running mail server software. We also observe that some of our FP are genuine in that they correspond to bots not included in our conservative ground truth (discussed below under ‘The not-so-benign hosts’). On the FN front, BotFlex missed detecting 6 bots. Nonetheless, its sensor module did record suspicious activity, for all except two hosts, in logs. These four hosts had a random mix of activities, with high number of DNS NXDOMAIN responses being the most
S. Khattak et al. / Journal of Network and Computer Applications 58 (2015) 144–154
GroundTruth
False Positive
BotFlex
BotHunter
100
100 Groundtruth Bots Detected (%)
Contribution to True Positive and False Positive (% )
True Positive
151
80
60
40
80
60
40
20
0
09:25 09:56 10:27 10:58 11:29 12:00 12:31 13:02 13:33 14:04 14:35 15:06 15:37 16:08
20
Time Fig. 9. Unique ground truth bots detected by BotFlex, BotHunter and through ground truth blacklist over 30 min intervals.
0 BotFlex
BotHunter
Fig. 7. Distribution of BotFlex and BotHunter TP, FP with respect to C&C blacklists.
Virut bots including the one not listed in the ground truth blacklist. BotFlex reported contact with Palevo C&C blacklist (Tracker, 2013), aggressive outbound scan and spam activities to be common across all the Virut bots. Before moving on to other performance evaluation metrics, we emphasize that BotFlex shows encouraging accuracy results using simple detection features taken from already published research. While this accuracy can (and hopefully will) be improved as the community introduces new detection features and correlation rules in BotFlex, we assert that many interesting features have already been proposed in existing literature as this first-cut BotFlex implementation can detect bots that are not even flagged by the groundtruth.
5.3. Detection delay
Fig. 8. BotFlex output profiles.
prevalent. However, these were not enough to trigger the detection rule at any single time. The not-so-benign hosts: From manual analysis of BotFlex's FP, we find some bots that were not flagged by the ground truth. This conforms with our earlier assertion (Section 5.1) that the ground truth is more sensitive than specific and hence we cannot completely rule out the possibility that a FP bot is actually benign. Of the 18 FP hosts, 3 clearly had profiles typical of bots (Fig. 8) while another 6 were seen engaged in suspicious activities. In particular, high number of DNS NXDOMAIN responses was the most common anomaly we observed for FP hosts that we believed were bots. The pattern of domain querying was similar to that employed by malware DGA, and some of the queried domains were found in recent C&C blacklists too. 5.2.4. A case study – Virut At the start of the year 2013, CERT Polaska took over 43 C&C servers associated with Virut botnet and published a report (CERT, 2013) which highlighted Pakistan as one of the top three countries where half of the Virut bots were located. We find that our trace contains 11 Virut bots based on contact with Virut C&C domain names revealed by the report. Of these, 10 IPs are also flagged as bots as per our ground truth blacklist. BotFlex detects all 11 of
Timely threat detection is crucial as it allows for evasive countermeasures to be deployed for threat mitigation. Hence we now analyze how long it takes for BotFlex to reach its maximum detection with respect to the ground truth (based on local hosts contacting ground truth C&C servers). Figure 9 shows the number of unique, and true, bots detected by BotFlex and BotHunter over 30 min windows. In all cases, a bot is counted only at its first detection. Ground truth blacklist identifies 50% of the total bots by the end of the first 30 min. The remaining bots are gradually detected over the next 6.5 h. BotFlex detects bots proportionally with the ground truth, with maximum lag in terms of bots detected never exceeding 11. Infact, during the first 1.5 h, BotFlex lags behind by merely a few bots (as low as 2). BotHunter lags behind the ground truth by an average of 23 bots. Intrigued by this result, we investigated why BotHunter's detection lags ground truth while BotFlex is able to keep up. Our preliminary analysis implies that this delay is a consequence of BotHunter's excessive reliance on its blacklists. Our hypothesis is that since the BotFlex detection Rule 3 (Table 2) incorporates evidence other than C&C communication, it can detect bots even before an actual communication to the C&C server. We verify this by counting the number of bots that are exclusively identified using only the C&C blacklists present with BotHunter and BotFlex. Figure 7 reveals that 96% of BotHunter's detections were based on C&C blacklist match. In contrast, 52% of BotFlex detections from Rule 3 had at least one non-C&C blacklist trigger, which contributed to an early detection.
152
S. Khattak et al. / Journal of Network and Computer Applications 58 (2015) 144–154
5.4. Run-time performance We now do performance evaluation of BotFlex with the aim of understanding its practicality for a real deployment. We base our assessment on two metrics, memory usage and packet processing throughput. Furthermore, accommodation of multiple correlation filters 4.3.1 is a unique feature of BotFlex. We compare memory usage and throughput for an instance of BotFlex using a single correlation filter with one using 25 filters. 5.4.1. Packet processing throughput In order to observe the processing overhead of BotFlex, we measure the packet throughput achievable while BotFlex is running. To formulate a baseline, we compare the throughput of BotFlex with the throughput of the underlying platform, Bro. We also evaluate for effect on throughput due to increase in the number of correlation filters over different runs. To measure throughput, we calculate the total packets in our trace and the time it takes for a tool to run on the entire trace. Bro processed our test traffic at 51 K packets/second – much faster than the speed at which packets were captured (28 K packets/ second). BotFlex (with 1 correlation filter) processed traffic at 47 K packets/second. Interestingly, increasing the number of BotFlex correlation filters to 25 had a negligible effect on its throughput (dropping to 46 K packets/second). Thus we conclude that BotFlex reduces Bro's throughput by only 4 K packets/second, and has (impressively) marginal impact on throughput when increasing the number of BotFlex correlation filters. Bro
BotFlex over Bro
90
Memory Used (MB)
80 70 60
5.4.2. Memory utilization Behavior-based intrusion detection systems can potentially consume a large amount of memory depending on their state maintenance logic for detection of threats. These systems run the risk of failing in case of limited memory resources or if memory consumption grows exponentially. Hence we choose memory utilization as another metric for evaluating BotFlex's performance. Figure 10 shows memory consumption of BotFlex and Bro over our test trace file. BotFlex uses constant memory over time with slight increase towards the end of the trace when it analyzes all accumulated state before purging it. BotFlex consumes nearly twice as much memory as its underlying platform, Bro. We observe BotFlex's memory usage in a series of test runs and find that its attack module and correlation framework consume excessive memory. We thus plan to focus on memory optimization for the identified modules as part of future work. We expect BotFlex's memory usage to improve once probabilistic data structures are fully integrated into the underlying platform Bro (Bro, 2013a). For large volumes of network traffic, maintaining even a small amount of state consumes a lot of memory. Probabilistic data structures improve the above limitation by trading off accuracy for memory. Further improvement may also be achieved through contributions from the community for developing optimized detection mechanisms. 5.4.3. CPU utilization We recorded CPU utilization of Bro with and without BotFlex while running it in pseudo-realtime mode (Bro, 2013b) over 500 GB trace obtained from the ISP at a link with an average data rate of E28 K packets per second (described earlier in Section 5.1). As such, Bro imitates handling of real-time network traffic by injecting artificial delays into packet processing based on timestamp differences between successive packets. We note in Fig. 11 that Bro with and without BotFlex has a consistent CPU usage of 50% with negligibly small spikes in case of the latter. We conclude that BotFlex takes longer to process the 500 GB trace (reflected by the longer processing duration in Fig. 11) but does not lead to a significant increase in Bro's CPU usage.
50 40
6. Conclusion and future directions
30 20 10 0
10
20
30
40
50
60
70
80
90
100
Trace Completed (%)
100 95 90 85 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10 5 0
Bro
CPU usage in (%)
CPU usage in (%)
Fig. 10. Memory consumption comparison of BotFlex and Bro with window size of 1 h.
Botnet research community currently lacks an open-source and community-driven tool to develop with ease, improve upon, and/ or benchmark existing and new botnet detection solutions. In this paper, we presented BotFlex – a domain-specific, flexible and extensible network-based tool for botnet detection. We evaluated BotFlex for accuracy and performance while comparing with
0
30
60
90
120
150
180
Time in (Minutes)
210
240
270
100 95 90 85 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10 5 0
Botflex
0
30
60
90
120
150
180
210
Time in (Minutes)
Fig. 11. CPU usage of Bro (left) and BotFlex (right) while processing 500 GB trace.
240
270
300
S. Khattak et al. / Journal of Network and Computer Applications 58 (2015) 144–154
relevant baseline tools, and found the results to be more than encouraging. We now list some directions which future work can focus on. Sensor module alerts: BotFlex's sensor module is currently driven by a basic set of alerts which needs to be extended. A number of potential badness indicators can be added to the sensor module. Vertical correlation can benefit from monitoring access to web content served on Fast Flux Service Networks (Passerini et al., 2008), periodic data transfers, and payload anomaly based exploit detection (Wang, 2007). Similarly, horizontal correlation can be extended to include SSH brute forcing, bulk data transfers and C&C (Strayer et al., 2008). BotFlex also currently cannot manipulate all malware exploit and C&C signatures compiled by the security community as these occur in custom syntax. BotFlex would benefit from automatic translation of well known third-party signatures to a format it can directly consume. Lastly, though not employed by BotFlex at the moment, a vein of botnet security research (Bilge et al., 2012b) consumes NetFlow data. A future extension can be to incorporate Netflow-only, or Netflow-enhanced operational mode into BotFlex (the underlying NIDS, Bro, provisions for accessing and processing NetFlow records). Correlation module: BotFlex's decision engine, the correlation framework, can be extended in a number of ways. First, the community can extend the rule base for botnet detection. Second, the framework utilizes a custom rule language for defining correlation policies that can be enhanced by supporting (i) definition of nested correlation policies, for example ‘rule-Y AND (rule-X in time T1) in time T2', (ii) replacing the current embedded C þ þ code for rule expression to an external system that does not require compilation, and (iii) event parametrization, i.e. being able to manipulate event parameters in contrast to having a binary perception of events. Finally, the framework presently organizes event history per index. It would be useful to have the option to add some data per index instead of history. This is particularly useful for aggregation based analysis where data of the same type is being measured. We will then also need to add language support for operations like UNIQUE, ENTROPY, MEDIAN, etc. Performance enhancement: We have carried out a preliminary performance evaluation of BotFlex. However, further work is required to make BotFlex scale to increased traffic volumes while also taking its underlying NIDS platform into account. Support for running BotFlex over cluster is an important extension as the underlying NIDS, Bro, is capable of running in clusters in large deployment scenarios.
Acknowledgments This work is supported by a grant from the Pakistan National ICT R&D Fund.
References Abu Rajab M, Zarfoss J, Monrose F, Terzis A. A multifaceted approach to understanding the botnet phenomenon. In: Proceedings of the 6th ACM SIGCOMM conference on Internet measurement. IMC '06. New York, NY, USA: ACM; 2006. p. 41–52. Ali Zand, Giovanni Vigna, Xifeng Yan, Christopher Kruegel. Extracting probable command and control signatures for detecting botnets. In: Proceedings of the 29th annual ACM symposium on applied computing (SAC '14). New York, NY, USA: ACM; 2014, p. 1657–62, http://dx.doi.org/10.1145/2554850.2554896. Amann B, Sommer R, Sharma A, Hall S. A lone wolf no more: supporting network intrusion detection with real-time intelligence. In: Proceedings of the 15th international conference on research in attacks, intrusions, and defenses. RAID'12. Berlin, Heidelberg: Springer-Verlag; 2012. p. 314–333. Bilge L, Balzarotti D, Robertson W, Kirda E, Kruegel C. Disclosure: detecting botnet command and control servers through large-scale netflow analysis. In:
153
Proceedings of the 28th annual computer security applications conference. New York, NY, USA: ACM; 2012a. p. 129–138. Bilge L, Balzarotti D, Robertson W, Kirda E, Kruegel C. Disclosure: detecting botnet command and control servers through large-scale netflow analysis. In: Proceedings of the 28th annual computer security applications conference. ACSAC '12. New York, NY, USA: ACM; 2012b. p. 129–138. Binkley JR, Massey B. Ourmon and network monitoring performance. In: Proceedings of the annual conference on USENIX annual technical conference. ATEC '05. Berkeley, CA, USA: USENIX Association; 2005. p. 46. BotFlex. 2013. Available from: 〈http://www.sysnet.org.pk/BotFlex〉. Bro. Probalilistic counting; 2013a. Available from: 〈http://www.bro.org/develop ment/projects/counting.html〉. Bro. 2013b. Available from: 〈http://www.bro.org/〉. Burghouwt P, Spruit M, Sips H. Detection of covert botnet commandand control channels by causal analysis of traffic flows. Cyberspace safety and security. Springer; 2013, p. 117–31. http://dx.doi.org/10.1007/978-3-319-03584-0_10. CBL. 2013. Available from: 〈http://www.cbl.abuseat.org/advanced.html〉. CERT-Polaska. 2013. Available from: 〈http://www.cert.pl/PDF/Report_Virut_EN.pdf〉. Chen R, Qiao L, Zhang B, Gong Z. A framework of event-driven detection system for intricate network threats. In: International conference on computer, networks and communication engineering (ICCNCE 2013). Atlantis Press; 2013. Choi H, Lee H, Lee H, Kim H. Botnet detection by monitoring group activities in dns traffic. In: Proceedings of 7th IEEE international conference on computer and information technology (CIT 2007). 2007. Goebel J, Holz T. Rishi: identify bot contaminated hostsby irc nickname evaluation. In: USENIX workshop on HotTopics in understanding Botnets (HotBots'07). 2007. Gu G, Perdisci R, Zhang J, Lee W. Botminer: clustering analysis of network traffic for protocol- and structure-independent botnet detection. In: Proceedings of the 17th conference on security symposium. SS'08. USENIX Association, Berkeley, CA, USA, 2008a; p. 139–54. Gu G, Porras P, Yegneswaran V, Fong M, Lee W. Bothunter detecting malware infection through ids-driven dialog correlation. In: Usenix security symposium. 2007. Gu G, Yegneswaran V, Porras P, Stoll J, Lee W. Active botnet probing to identify obscure command and control channels. In: Proceedings of the 26th annual computer security applications conference (ACSAC). 2010. Gu G, Zhang J, Lee W. Botsniffer: detecting botnet command and control channels in network traffic. In: Network and distributed system security symposium (NDSS). 2008b. Haq O, Ahmed W, Syed AA. Titan: enabling low overhead and multi-faceted network fingerprinting of a bot. In: Proceedings of the 2014 44th annual IEEE/IFIP international conference on dependable systems and networks. DSN '14. Washington, DC, USA: IEEE Computer Society; 2014. p. 37–44. Khan H, Mirza F, Khayam SA. Determining malicious executable distinguishing attributes and low-complexity detection. J Comput Virol May 2011;7(2): 95–105. Khattak S, Ahmed Z, Syed AA, Khayam SA. Poster: botflex: a community-driven tool for botnet detection. In: Poster session, ACM international conference on computer and communication security (CCS). November 2013. Liu L, Chen S, Yan G, Zhang Z. Bottracer: execution-based bot-like malware detection. In: 11th information security conference. 2008. Luckham DC. The power of events: an introduction to complex event processing in distributed enterprise systems. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc.; 2001. Nayatel. 2013. Available from: 〈http://www.nayatel.pk/〉. Papadogiannakis A, Polychronakis M, Markatos EP. Scap: stream-oriented network traffic capture and analysis for high-speed networks. In: Internet measurement conference. 2013. p. 441–54. Passerini E, Paleari R, Martignoni L, Bruschi D. Fluxor: detecting and monitoring fast-flux service networks. In: Proceedings of the 5th international conference on detection of intrusions and malware, and vulnerability assessment. DIMVA '08. Berlin, Heidelberg: Springer-Verlag; 2008. p. 186–206. Paxson V. Bro: a system for detecting network intruders in real-time. In: Proceedings of the 7th conference on USENIX security symposium, vol. 7. SSYM'98. Berkeley, CA, USA: USENIX Association; 1998. p. 3. Provos N, McNamee D, Mavrommatis P, Wang K, Modadugu N. The ghost in the browser analysis of web-based malware. In: Proceedings of the first conference on first workshop on hot topics in understanding botnets. HotBots'07. Berkeley, CA, USA: USENIX Association; 2007. p. 4. Ramachandran A, Feamster N, Dagon D. Revealing botnet membership using dnsbl counter-intelligence. In: Conference on steps to reducing unwanted traffic on the internet (SRUTI). 2006. Roesch M. Snort – lightweight intrusion detection for networks. In: Proceedings of USENIX LISA'99. 1999. Shin S, Xu Z, Gu G. Effort: a new host-network cooperated framework for efficient and effective bot malware detection. Comput Netw 2013;57(13):2628–42. Strayer T, Lapsley D, Walsh R, Livadas C. Botnet detection based on network behavior. Advances in Information Security, 36. Springer; 2008, p. 1–24. http:// dx.doi.org/10.1007/978-0-387-68768-1_1. Strayer T, Walsh R, Livadas C, Lapsley D. Detecting botnets with tight command and control. In: Proceedings 2006 31st IEEE conference on local computer network. 2006. Tracker P. 2013. Available from: 〈https://www.palevotracker.abuse.ch/〉.
154
S. Khattak et al. / Journal of Network and Computer Applications 58 (2015) 144–154
Villamarin-Salomon R, Brustoloni J. Identifying botnets using anomaly detection techniques applied to dns traffic. In: Proceedings of 5th IEEE consumer communications and networking conference (CCNC 2008). 2008. Wang K. Network payload-based anomaly detection and content-based alert correlation [Ph.D. thesis]. New York, NY, USA; 2007. aAI3249142. Wang T, Yu S-Z. Centralized botnet detection by traffic aggregation. In: IEEE international symposium on parallel and distributed processing with applications. 2009. Yadav S, Reddy ALN. Winning with DNS failures: strategies for faster botnet detection. In: 7th international icst conference on security and privacy in communication networks (SecureComm). London, United Kingdom. September 2011.
Yan G. Peri-watchdog: hunting for hidden botnets in the periphery of online social networks. Comput Netw 2013;57(2):540–55. Yen T-F, Oprea A, Onarlioglu K, Leetham T, Robertson W, Juels A, et al., Beehive: large-scale log analysis for detecting suspicious activity in enterprise networks. In: Proceedings of the 29th annual computer security applications conference. New York, NY, USA: ACM; 2013. p. 199–208. Yen T-F, Reiter MK. Traffic aggregation for malware detection. In: Conference on detection of intrusions and malware & vulnerability assessment (DIMVA). 2008. Zhuang L, Dunagan J, Simon D, Wang H, Osipkov I, Hulten G, et al., Characterizing botnets from email spam records. In: USENIX workshop on large-scale exploits and emergent threats. 2008.