IP Traceback using header compression

IP Traceback using header compression

IP Traceback using header compression Hassan Aljifri, Marcel Smets and Alexander Pons University of Miami, PO Box 248818, Coral Gables, FL 33124, USA;...

1MB Sizes 2 Downloads 89 Views

IP Traceback using header compression Hassan Aljifri, Marcel Smets and Alexander Pons University of Miami, PO Box 248818, Coral Gables, FL 33124, USA; email: [email protected]; tel: +1 305 284 4767; Fax: +1 305 284 5161

Abstract Denial-of-service and other malicious attacks have become increasingly prevalent in recent years. A major issue hindering the ability to trace attacks to their sources is the ease of IP address spoofing which conceals the attackers’ identity. Several techniques, generically named IP traceback, have been proposed to enable tracing of IP packets from destination to source despite IP spoofing. In this paper, we propose a Simple, Novel IP Traceback using Compressed Headers (SNITCH) that is based upon Probabilistic Packet Marking (PPM). This technique employs header compression to increase the number of bits available for insertion of traceback information. Simulations performed on empirical data have shown that 100% of the attack paths can be determined with a maximum of 0.43% false positive paths. Keywords: IP Traceback, DoS, Header Compression, PPM

1 Introduction

Computers & Security Vol 22, No 2, pp136-151, 2003 Copyright ©2003 Elsevier Science Ltd Printed in Great Britain All rights reserved 0167-4048/03

136

Malicious computer attacks are non-trivial events that can have a substantial economic impact on the victim. According to the 2002 CSI/FBI Computer Crime and Security Survey [1] financial losses reported by survey respondents for all computer related crimes totaled approximately $1.46 billion between 1997 and 2001. During the first five months of 2002, 40% of survey respondents reported experiencing Denial of Service (DoS) attacks, a 67% increase since 1997. As of May 2002 these respondents reported losses totaling greater than $18 million, an increase of 429% over losses incurred in all of 2001. Direct costs attributable to DoS and/or DDoS attacks result from the consumption of available resources and

increased circuit costs in systems where billing is based on the provisioning of measured use circuits due to increased traffic levels [2]. Current DoS attack trends include: a higher degree of automation and self-propagation employed by the attack tools, targeting of systems known to have potentially exploitable vulnerabilities, and targeting of end users perceived as being less security conscious [3]. The status of DoS attacks on the Internet has remained essentially unchanged with most DoS payloads changing little since 1999. This can be attributed in part to the existence of vulnerable systems that remain unpatched, misconfigured or insecurely managed. In addition, attack tool lifecycles lasting two to three years have been recorded even though there has been an effort in the security community and system vendors to raise awareness of serious security issues [2]. Attackers can utilize well-known DoS algorithms because they remain effective in accomplishing their objectives. Of importance in the security community is the identification of these attackers. The ability to ascertain the source of DoS attacks becomes complicated with the relative ease of IP address spoofing [4], which enables attackers to remain anonymous to their victims. The ability to reliably and expeditiously trace packet streams back to their sources would facilitate the identification and prosecution of the perpetrators and provide some measure of deterrence to risk adverse individuals [5]. A number of implementation strategies for IP traceback have been proposed and are discussed in Section 2. Several of the most practical IP traceback strategies encode attack path information within the limited IP header space of packets routed through the network(s) [6-9]. Space constraints preclude the ability to store full path information within each packet. To

Hassan Aljifri, Marcel Smets and Alexander Pons IP Traceback Using Header Compression

overcome this limitation partial path information is encoded in selected packets as determined by Probabilistic Packet Marking (PPM) techniques [8]. The use of PPM ensures that path information cannot be falsified between the attacker and the victim. Encoding schemes can also fragment information from each node in an effort to decrease storage requirements, but ultimately results in a greater number of packets required for complete path reconstruction. We propose a method that will decrease the number of packets required for complete path reconstruction by circumventing the space limitations of current IP traceback schemes. Our method is founded upon the concept of IP Header Compression as described in RFC 2507 and RFC 1144 [10, 11]. According to RFC 2507, headers that can be compressed include TCP, UDP, IPv4 and IPv6 base and extension headers. This concept is based on the supposition that up to 50% of the TCP/IP header contents do not change during a session. If an initial frame is sent with a full header, subsequent frames can be sent without the static content (the context) being included in the header. A context identifier is inserted into full and compressed headers, which is used to identify the proper context required during the decompression phase. A packet with a full header that includes a new context identifier is transmitted whenever context changes. The aim is to decrease packet sizes resulting in increased line efficiency over slow links. Instead of decreasing the packet size we propose to utilize the available space for insertion of IP traceback information. The amount of accessible space is 16 bytes as compared to the 2 bytes currently available to current IP traceback methods. The additional storage capacity enables an increased amount of traceback information to be stored in each packet than is currently possible. The result is a decrease in the total number of packets necessary for path reconstruction, a greater degree of protocol

flexibility, and a reduction in the time required to determine the attacker. The manuscript is divided as follows: Section 2 is a description of related work, Section 3 provides an overview of the proposed methods, Section 4 provides a discussion of the advantages and disadvantages of the described methods, and Section 5 summarizes our proposal.

2 Related work DoS/DDoS attacks target the finite resources of a victim’s system resulting in varying degrees of service disruption in order to prevent legitimate use by authorized users. Resources targeted include bandwidth, processing power, and storage capacities [2]. Defending computer systems against DoS/DDoS attacks has proven to be challenging. Measures must be taken to prevent or mitigate attacks and to respond to attacks. Measures taken to prevent or mitigate the effects of attacks are categorized as preventative, while measures that respond to attacks can be categorized as reactive or proactive [12]. Preventative measures include optimization of software parameters [13], ingress filtering [14], and rate limiting [15]. In response to continuing or recurring attacks, positive identification of the attack source is imperative. The ability to trace packets from destination to source is known as IP traceback. The stateless nature of the IP routing protocol and the ease of IP address spoofing make IP traceback problematic. Recent research has focused on a variety of IP traceback methods to identify the source(s) of computer system attacks. Reactive measures include input debugging [16] and controlled flooding [17], while proactive measures include logging [12, 18-20], messaging [21, 14, 22] and packet marking [17, 6-9].

Hassan Aljifri Hassan Aljifri is an assistant professor in the Computer Information Systems Department at the University of Miami. He received a Ph.D. in Computer Engineering from the University of Miami. Most of Dr. Aljifri's recent research has been focused on methodology, framework, and techniques for designing secure systems. Alexander Pons Alexander Pons is an assistant professor in the Computer Information Systems department at the University of Miami. For the past several years he has been involved in various aspects of real-time systems and databases as a researcher and developer. His research interest includes real-time systems, programming languages, databases and Internet technology. Marcel Smets Marcel Smets holds an MS in Computer Information Systems as well as a Master's Certificate in Telecommunications Management from the University of Miami School of Business. Mr. Smets is currently employed as a Research Associate in the Department of Medicine at the University of Miami School of Medicine.

2.1 Reactive measures A measure is considered reactive when the traceback process is initiated in response to an attack. Reactive measures include input

137

Hassan Aljifri, Marcel Smets and Alexander Pons IP Traceback Using Header Compression

debugging and controlled flooding. These measures are initiated after an attack has been detected, but must be completed while the attack is still active as they are ineffective once the attack ceases.

2.1.1 Input debugging Input debugging is implemented at the router level. A filter is placed on the egress port of the router upstream to the victim and the ingress port is determined by matching the ‘attack signature’ in packets arriving at the router. Once an ingress port has been identified the procedure is repeated at upstream routers, hop-by-hop, until the originating network is identified. Disadvantages to this technique include: input debugging feature must be present on all routers, dependence on manual intervention, the high degree of ISP cooperation necessary, it is a relatively slow process, and the attack stream must remain active and thus cannot be used for post-mortem analysis. MCI’s DoSTracker (Ciscospecific PERL script) was an effort to automate these tasks, but is no longer officially distributed or supported [16]. Another variation on input debugging attempts to determine attack sources via hop-by-hop tracking using an IP overlay network [16]. This method employs IP tunnels created between edge and transit routers. Traffic destined for the victim is rerouted to special tracking routers when an attack signature is matched. The system then performs hop-by-hop input debugging starting from the tracking router closest to the victim to determine the attack path. Advantages include: overall decrease in the number of hops that need to be determined, required features already available on highspeed transit routers, ability to survive changes to the backbone architecture, and scalability. Disadvantages include: input debugging still required, additional administrative burden, difficult to track attacks from within the backbone, and an attacker can detect change in route (via traceroute) and thus be alerted to the tracking process.

138

2.1.2 Controlled flooding This approach works by flooding upstream links with large amounts of UDP-Chargen traffic and observing whether this decreases the attack traffic rate [17]. When an affected link is identified cooperative hosts upstream must be willing to repeat this process recursively until a specific network or network segment where the attack appears to originate is identified. The appropriate network operator would then be contacted in order to ultimately identify the attack source. The major advantages are the effectiveness and ease of implementation. Disadvantages include: the method itself is a form of DoS attack, the high degree of upstream host cooperativity necessitated, requirement for manual application of loads, difficulty discerning multiple paths during DDoS attacks, and the requirement of a detailed topological map of the Internet.

2.2 Proactive measures A measure is considered proactive when tracing information is concurrently generated as packets are routed through the network for subsequent attacker identification. When deemed necessary, the resulting traceback data is used for attack path reconstruction. Examples of proactive measures include logging, messaging and packet marking.

2.2.1 Logging An audit trail can be established at routers or other specialized hardware by logging information as packets are routed through the network. Logged data can be retrieved for attack path reconstruction and response. A number of logging schemes have been proposed. The Intrusion Detection and Isolation Protocol (IDIP) [18] and the CITRA architecture [19] enable network components to exchange tracing information about system attacks so that an automated response can be triggered. Audit information is logged when nodes detect an intrusion and an attack report is forwarded to neighboring nodes to help trace the path. The

Hassan Aljifri, Marcel Smets and Alexander Pons IP Traceback Using Header Compression

system can trace intrusions across network boundaries and block the attack at boundary controllers nearest the attack source. Centralized reporting and coordination of responses between IDIP components allows the system to kill connections, modify filtering rules, disable accounts and reverse previous actions when deemed appropriate. Responses are limited to those that will cause less damage than the attack itself. Traceback is only supported within the IDIP-enabled network. The Source Path Isolation Engine (SPIE) [20] consists of multiple components under centralized control that interface with an IDS and respond to trace requests. A SPIE-enhanced router computes the hash of multiple fields in the IP packet header and logs the resulting digest tables using space-efficient Bloom filters. The digest tables also associate a time interval and the set of hash functions used to compute the digests over that time interval. The system is capable of handling packet transformations by storing digests of transformed packets in a Transform Lookup table. When a traceback request is made the digest tables for a specified time period are transferred to longer-term storage for analysis since router storage duration is approximately 1 minute. A central controller that interfaces with the users IDS handles attack graph reconstruction and the results are passed back to the requesting system. Traceback of single packets can be accomplished. Baba and Matsuda [12] describe a multicomponent logging system that uses datalink-level identifiers to determine nodes along an attack path. Each node stores multiple fields from the IP header (packet feature structure) and the datalink-level identifier from the previous node. The forwarding nodes then change the datalink-level identifier to its own identifier and send the packet to the next node. When an attack is detected the system issues a tracing request. The datalink-level identifier that is transmitted with a packet identifies adjacent nodes. The adjacent node is queried

for the packet feature structure of an attack packet and its associated datalink-level identifier will identify the upstream node in the attack path. This procedure is repeated recursively until the attack source has been identified. Traceback of single packets can be accomplished.

2.2.2 Messaging Another approach to the traceback problem is to use out-of-band signaling to aid in attack path reconstruction. A proposal by Bellovin [21] has routers along an attack path probabilistically (1/20 000 packets) generate an ICMP traceback (iTrace) message that is sent to the destination. Each iTrace message would include partial path information including: previous and next hop IP addresses, timestamp, as much of the traced packet as will fit and authentication to prevent tampering. If a sufficient number of traceback packets are received, the complete attack path can be reconstructed. Enhancements to this basic protocol have been proposed to improve performance [23, 22].

2.2.3 Packet marking Burch and Cheswick [17] proposed that traceback information could be stored within the IP packets themselves. This packet-marking scheme could include the IP addresses of all of the routers along the attack path in each packet (deterministic packet marking, DPM) or it could mark only one in n packets that are routed through the network (probabilistic packet marking, PPM). Implementation of DPM is problematic due to insufficient storage space within the packets and the increased router overhead associated with marking every packet if sufficient space were available. PPM is subject to the same storage space limitation in addition to the large number of packets that must be received in order to infer router order during attack path reconstruction. Savage [8] describes a variation of PPM in order to overcome these limitations. The concept of

139

Hassan Aljifri, Marcel Smets and Alexander Pons IP Traceback Using Header Compression

compressed edge fragment sampling is introduced. The IP Fragment Identification field (IP ID) within the IP header is used to store the traceback information. The IP ID is partitioned into three fields: an edge fragment (EF) field, a fragment offset (FO) field and a distance field (DIST). When a router decides to probabilistically mark a packet the router calculates a hash of its IP address and bitinterleaves it with its IP address. This bitinterleaved combination is broken up into fragments that are identified by a FO value and a selected fragment stored in the EF field along with the FO value and the DIST field set to zero. The next router checks the DIST field and, if set to zero, the router computes a hash of its IP address and bit-interleaves the hash with its IP address. The router fragments this bitinterleaved combination, examines the FO field in the IP ID of the packet received and XORs the appropriate fragment bearing the same FO value with the fragment currently stored in the EF field, then sets the DIST field to one. This results in the ability to store the partial addresses of two consecutive routers (edge-ID). When a router does not probabilistically mark a packet it is mandatory that the DIST field be incremented. Packets received by the victim will contain fragments of edge-IDs or will arrive with unmodified fragments from one hop away. Fragments received from the last router can be used to decipher the prior edge-ID. This process is repeated recursively until the first router is reached. Deciphered fragments with the same DIST value are combined and bitdeinterleaved. The hash of the resulting IP address is compared to the hash that was bitinterleaved. If the hashes are identical the address is considered valid, otherwise the address is discarded. The final attack path is reconstructed from the combination of valid edge addresses. Song and Perrig [9] proposed modifications to Savages’ edge-ID based PPM method to further reduce the storage space requirements by storing

140

a hash of the IP address instead of the address itself. The victim is assumed to possess a complete network map of all upstream routers. After edge-fragment reassembly the resulting IP address hashes would be compared to the router IP address hashes derived from the network map to facilitate attack path reconstruction. This modified method is claimed to be more effective against DDoS attacks than previous methods. The authors also propose an Authentication Marking scheme that uses Message Authentication Codes (MAC) to prevent packet content tampering by compromised routers along the attack path. Park and Lee [7] assessed the effectiveness and limitations of PPM methods using a constrained minimax optimization approach. Variables that affect the ability of a victim to localize an attacker include: packet marking probability, path length, number of simultaneous attack paths, and attack traffic volume. The authors conclude that PPM is effective against DoS attacks but may be less effective for DDoS attacks due to an uncertainty amplification effect. Dean [6] proposed a modified PPM method that employs algebraic techniques from the fields of coding theory and machine learning to encode/decode path information as points on polynomials. The authors describe schemes for full path encoding, randomized path encoding and edge encoding. Several schemes for attack path reconstruction are presented based on whether path information was encoded with or without the use of a distance field. Encoded path information can be stored in the FLAGS (1-bit), TOS (8-bit), and IP ID (16-bit) fields of the IP header. This method can be implemented as a PPM technique or as an outof-packet scheme.

3 Overview of SNITCH In this paper we focus on attacker recognition by proposing a new method based on packet

Hassan Aljifri, Marcel Smets and Alexander Pons IP Traceback Using Header Compression

marking. PPM is an efficacious solution to the traceback problem with modest implementation requirements as compared to other proposed traceback methods. Some of the advantages of packet marking include lack of dependence on inter-ISP cooperation, not a form of DoS attack itself, no specialized hardware or extensive infrastructure required and low router or network overhead. However, a major limitation constraining PPM traceback methods is the restricted amount of storage space within the IP header. Compressed edge fragment sampling [8] has been proposed to circumvent this restriction. While this is an ingenious solution it increases the complexity of path reconstruction and requires an increased volume of packets. PPM methods would be greatly enhanced if sufficient storage space could be found to store edge information without fragmentation. SNITCH is a modified PPM technique that utilizes space in the IP header made available by the currently implemented IP header compression technique as described in RFC 2507/RFC1144. Our method extends the approach of Savage [8] by overcoming the space limitation for IP traceback data insertion that hinders existing IP traceback methodologies. SNITCH is able to determine 100% of the attackers with an extremely low percentage of false positive paths (maximum of 0.43% for 5067 simultaneous attackers) using significantly fewer packets than present techniques.

3.1 Basic assumptions A single traceback design cannot address every potential attack scenario. Several basic assumptions need to be made in order to define the design boundaries. • Attackers are able to generate any packet. • Attackers send a multitude of packets. • Multiple attackers may act in a coordinated fashion.

• Attackers are cognizant of traceback ability. • Routers are infrequently compromised. • Attack path remains stable. • Routers possess limited processing and storage capabilities. • Each attacker will send at least the minimum number of packets with the same source address. The vast array of attack tools currently available enables attackers to generate any packet type desired. A large volume of packets bombards the victim during packet flood DoS attacks. An attacker can choose from a variety of packet flooding strategies including: single source, single stream attacks; single source, multiple stream attacks that use networks with broadcast capabilities as amplifiers (e.g., Smurf, PapaSmurf, Fraggle); coordinated, multiple source attacks (DDoS). Most attackers are aware of the potential for traceback and routinely employ IP address spoofing and/or indirection to avoid identification. Development of accurate and reliable traceback methods would facilitate the discovery and prosecution of would-be attackers. In addition, the rapid identification of attack sources would enable the victim to mount timely and effective responses to these attacks. For packet marking schemes to be effective the routing path must remain stable over the time period required for attack source identification. Changes in path routing during DoS/DDoS attacks would complicate attack path reconstruction and require the receipt of an increased number of packets by the victim. Paxson [23] has shown that >90% of the routing paths used in the Internet remain stable for hours, sufficient time to complete the traceback process. Routers are highly specialized devices optimized for maximal throughput with bandwidths

141

Hassan Aljifri, Marcel Smets and Alexander Pons IP Traceback Using Header Compression

currently in the Gbps range. Due to this dedicated nature routers possess limited resources for additional tasks. Thus, it is mandatory that traceback implementations incur minimal per-packet overhead to avoid having a negative impact on router performance. The victim will need to receive a minimum number of packets with the same source address in order to accurately reconstruct the attack paths. This minimum number of packets necessary for path reconstruction is proportional to the number of hops in the attack path.

3.2 RFC 2507/RFC 1144 Our proposal is based on the IP Header Compression schemes described in RFC 2507 [10] and RFC 1144 [11]. The purpose of these protocols is to improve line efficiency (increase throughput) over low bandwidth links as a result of packet size reduction due to decreased header overhead. These header compression schemes assume that a number of header fields remain constant or change infrequently in sequential packets within the same packet stream. Figure 1 shows the IP header fields, the shaded IP header fields remain static and are available for compression. The Total Length field is used to pass the CID for full headers. The Fragment Identification field is used to differentiate standard IP Header Compression from SNITCH. RFC 2507 states: Figure 1: IP Header Fields.

142

“The general principle of header compression is to occasionally send a packet with a full header; subsequent compressed headers refer to the context established by the full header and may contain incremental changes to the context.”

In brief, compression removes information from the static header fields (the context) resulting in packet size reduction. To accomplish this goal a packet is first transmitted containing a full header bearing a context identifier (CID) number that defines the context to be used by the decompressor. The decompressor reinserts the context, referred to by the CID contained in the full header that preceded the compressed stream, into the compressed packet to restore the header. Alterations to the context result in transmission of a packet containing a full header bearing a new CID to be used for subsequent header decompression. For additional operational details the reader is referred to RFC 2507 and RFC 1144. We propose to modify this procedure by inserting traceback information into the ‘compressed’ (empty) header fields and then extracting this information before decompression. Our purpose is not to decrease packet size, but rather to utilize this ‘unused’ space to store traceback information. The amount of space resulting from IP header compression is 144-bits. Cisco Systems, which commands an 85.5% overall share of the global router market [24],

Hassan Aljifri, Marcel Smets and Alexander Pons IP Traceback Using Header Compression

Figure 2: Traceback Header Fields.

supports RFC 2507 in a wide variety of products (2600-, 3600-, 4000-M-, 7100-, 7200-, and 7500-series routers). Rapid deployment of our new method would be facilitated by the widespread support of RFC 2507.

3.3 Method overview Header compression creates sufficient space to store edge data without fragmentation. Figure 2 shows that of the 144-bits made available by header compression, 80-bits are divided into four fields: 32-bits each for the starting and ending edge addresses (named LEFT and RIGHT, respectively), an 8-bit DIST field, and an 8-bit CID field for compressed headers. The IP header Total Length field will be used to pass the CID as the link layer implementation provides the length of the packets (inferred from the size of the link-layer frame) [10]. We will utilize the rarely used IP ID field to differentiate between compression and SNITCH as Stoica [27] has shown that less than 0.22% of all network packets were fragments. In order to differentiate between header compression and SNITCH all the bits in the IP ID field are set to ones to indicate that SNITCH is being used. Since RFC 2507 does not require all packets in a packet stream to be compressed SNITCH will not require packets to be compressed if they are not going to be probabilistically marked. This results in reduced per-packet overhead, as not all packets will need to be modified by the router. Header compression/traceback data insertion will be performed on single packets only. When a router determines that a packet is to be marked it must first transmit a packet with a full header bearing a CID before it can

‘compress’ subsequent packets and insert the traceback data. Packets with full headers or that have traceback data inserted will have all the bits in the IP ID field set to equal one to differentiate between SNITCH and header compression. The context of the second packet is compared with the context established by the preceding packet and if found to be identical the second packet will be subject to traceback data insertion. The context established by the CID from the full header will be used to ‘decompress’ the ‘compressed’ packets bearing a matching CID. It is essential that the traceback data be extracted from the ‘compressed’ packets before the context is used to restore the packet header to its uncompressed state.

3.3.1 Method 1: SNITCH using Single Edge Sampling Once a router decides to mark a packet it will insert its IP address into the LEFT field and set the DIST field to zero. If the context of the packet to be marked does not match the context of the preceding full header packet then the second packet will be forwarded with a full header bearing a new CID to establish the context to be used for subsequent SNITCH header compression/decompression. If the next downstream router decides not to mark the packet, it determines if the packet contains traceback data by examining the IP ID field and then evaluates the DIST field. If the DIST field value is zero the router inserts its IP address into the RIGHT field and increments the DIST field to one. If the downstream router is not going to mark the packet and the DIST field is nonzero it is mandatory that the router increment the DIST field. The process of attack path reconstruction utilizes the alignment of overlapping IP

143

Hassan Aljifri, Marcel Smets and Alexander Pons IP Traceback Using Header Compression

Figure 3: Attack Path Reconstruction.

addresses in packets from adjacent routers (Figure 3). The first step in determining the attack path is to order the packets containing Figure 4: Traceback Header Fields Using the XOR Function.

144

edge data by the DIST field. The LEFT field of packets with DIST field equal to zero contains the IP addresses of the routers closest to the victim. This will be the first of the IP addresses used to reconstruct the attack paths. Packets having DIST fields equal to one will contain the IP address of the first router from the victim in the RIGHT field and the IP address of the second router from the victim in the LEFT field. The LEFT field of the router closest to the victim (DIST=0) is compared with the RIGHT field found in packets from adjacent routers (DIST=1). If these corresponding IP addresses are identical, the packets are considered part of the same candidate attack path. The process of comparing the LEFT and RIGHT fields in packets from adjacent routers is repeated until

Hassan Aljifri, Marcel Smets and Alexander Pons IP Traceback Using Header Compression

Figure 5: Traceback Header Fields Using the XOR Function plus Bit Rotation.

the path cannot be extended any further. The use of Single Edge Sampling allows for an alignment overlap of one IP address. While this overlap is sufficient to accurately determine attack paths during DoS events the number of false positive paths generated during DDoS events can be substantial.

3.3.2 Method 2: SNITCH using XOR Similar to the previously described method, when marking a packet the router inserts its IP address into the LEFT field. But unlike the previously described method the router also inserts its IP address into the RIGHT field. Another difference between the two methods is how subsequent routers mark the packets. If any of the successive downstream routers do not mark the packet they XOR their IP address

with the contents of the LEFT field (creating an XOR cascade) and then increment the DIST field. Upon arrival at the victim, the LEFT field of marked packets will contain the XOR cascade starting from the initial marking router; the RIGHT field will contain the IP address of the initial marking router (Figure 4). As with the previous method, the traceback data will be extracted before the decompressor restores the IP header to its uncompressed state. To reconstruct the attack path any duplicate packets are eliminated and the received packets are ordered by the distance fields. Path reconstruction begins by pairing packets with DIST = 0 and DIST = 1. The value of the LEFT field of a packet (DIST = 0) is XORed with the value of the RIGHT field of a packet

145

Hassan Aljifri, Marcel Smets and Alexander Pons IP Traceback Using Header Compression

Figure 6: Multiple overlapping attack paths during DDoS.

contained in the RIGHT field of the matched packets. This procedure is repeated by pairing packets containing sequential DIST fields until the path can no longer be extended. SNITCH using XOR vastly decreases the number of false positive paths generated by encoding multiple edge information into the packets used for traceback.

3.3.3 Method 3: SNITCH using XOR plus Bit Rotation The major difference between Methods 2 and 3 is the use of bit rotation on the IP addresses. Bit rotation will be applied to each IP address before storage in marked packets and when determining router IP addresses during path reconstruction. During packet marking the bits in an IP address (represented here by an eightbit number for illustrative purposes) are rotated to the right by the number of bits equal to one that are present in the address, such that 10010110 becomes 01101001. During path reconstruction the bits in an IP address are rotated to the left by the number of bits equal to one that are present in the address, such that 01101001 becomes 10010110. The packet marking procedure is similar to the one described in Method 2. When a packet is to be marked the router inserts its bit rotated IP address into the LEFT and RIGHT fields. If any of the successive downstream routers do not to mark the packet they XOR their bit rotated IP address with the contents of the LEFT field and then increment the DIST field. Upon arrival at the victim the LEFT field of marked packets will contain the bit rotated XOR cascade starting from the initial marking router, the RIGHT field will contain the bit rotated IP address of the initial marking router (Figure 5).

(DIST = 1) and the result is compared with the value in the LEFT field of the packet with DIST = 1. Packets are considered part of the same candidate path if the values match. The router IP addresses for the candidate path are

146

Figure 5 shows an example of Method 3. The LEFT fields contain the result of the XOR cascade after bit rotation. The RIGHT fields contain the IP address of the marking router after bit rotation. The DIST field represents the number of hops from the marking router to the

Hassan Aljifri, Marcel Smets and Alexander Pons IP Traceback Using Header Compression

victim (minus one). The attack paths are the same used in Figure 4 with bit rotation applied to the IP addresses before the XOR function is applied (LEFT) or before the addresses are stored (RIGHT). Note that Attack Path #1 and Attack Path #2 no longer generate identical XOR cascade values (at DIST = 2 and DIST = 3). This method dramatically reduces, but does not eliminate all ambiguities that lead to the generation of false positive during path reconstruction. As with the previous method, the traceback data will be extracted before the decompressor restores the IP header to its uncompressed state. To reconstruct the attack path any duplicate packets are eliminated and the received packets are ordered by the distance fields. Path reconstruction begins by pairing packets with DIST = 0 and DIST = 1. The (already bit rotated) value of the LEFT field of a packet (DIST = 0) is XORed with the (already bit rotated) value of the RIGHT field of a packet (DIST = 1) and the result is compared with the value in the LEFT field of the packet with DIST = 1. Packets are considered part of the same candidate path if the values match. The (already bit rotated) router IP addresses for the candidate path are contained in the RIGHT field of the matched packets. This procedure is repeated by pairing packets containing sequential DIST fields until the path can no longer be extended. To determine the actual router IP addresses the RIGHT fields of packets from a candidate path are bit rotated to the left by the number of ones present.

4 Discussion We have proposed three related PPM methods that attempt to overcome many of the limitations attributed to IP traceback schemes. Our first proposed method, SNITCH using Single Edge Sampling, is effective for DoS attacks and DDoS attacks that have well-differentiated paths. This method may be subject to attack path ambiguities (false positive edge

combinations) under DDoS attacks where multiple overlapping attack paths diverge and then converge. Figure 6 illustrates how attack paths from multiple attackers share several common routers closest to the victim then become distinct at some point further from the victim. The routers shared by all paths are closest to the victim while routers shared by some, but not all paths, are at an increased distance from the victim. Ambiguities leading to the generation of false positive paths during attack path reconstruction result from these overlapping divergent/convergent path regions. These ambiguities result from the single IP address overlap used during attack path reconstruction. The impact of these falsely reconstructed paths depends on whether the exact path must be determined or if identification of the attack source is paramount. The second proposed method, SNITCH using XOR, is an improvement over the previously described method as it decreases attack path ambiguities by several orders of magnitude (Figure 7). This is accomplished by encoding multiple edge information into the packets used for traceback. Additional edge data inserted into traceback packets allows for a greater degree of overlap during path reconstruction. This solution is only effective as long as the number of overlapping IP addresses in the path is less than or equal to the degree of overlap used during path reconstruction. All of the attack path routers can be identified, but there remain some intra-path ambiguities. Path ambiguities can be attributed to the use of the XOR function. Certain combinations of numbers can XOR to the same value (e.g., 125 ⊗ 110 = 19 and 106 ⊗ 121 = 19), thus leading to false packet matches during path reconstruction. Figure 4 illustrates how two similar, but different, attack paths can generate identical LEFT field values if a certain combination of IP addresses have the XOR

147

Hassan Aljifri, Marcel Smets and Alexander Pons IP Traceback Using Header Compression

Figure 7: Number of Attackers vs. Number of False Paths Generated.

148

function applied to them. The XOR values for the LEFT fields of packets at distances equal to 2 or 3 are the same even though they do not follow identical paths. However, these intrapath ambiguities do not fully inhibit the attackers’ identification. The impact of these false positive paths depends on whether the intention is to determine the exact attack path or to determine where the attack has been initiated.

can XOR to the same value thus leading to minimal, but detectable, false packet matches during path reconstruction (Figure 7). Using the same attack paths as Figure 4, Figure 5 illustrates how bit rotation before application of the XOR function can prevent the ambiguities that arise from application of the XOR function solely. The XOR values for the LEFT fields of packets at distances equal to 2 or 3 no longer produce ambiguous results.

Our third proposed method, SNITCH using XOR plus Bit Rotation, is a further evolution of Method 2 that decreases false positives during path reconstruction to a minimal level. All of the attack path routers can be identified, but there remains a low level of intra-path ambiguities that can be attributed to the use of bit rotation followed by application of the XOR function. Our data, based on actual Traceroute data [25], indicates that a limited number of combinations of bit rotated numbers (>0.5%)

We have created simulated implementations of the three methods described above utilizing actual traceroute data obtained from Lucent Bell Labs [25]. The single source of the data set is used as the victim and a selected subset of the traceroute data set is used as the upstream routers (5067 paths, max. path length = 29 hops). The simulation then probabilistically marks the packets at routers along the attack path. Traceback data extracted from packets received during the simulation is used to

Hassan Aljifri, Marcel Smets and Alexander Pons IP Traceback Using Header Compression

Figure 8: Packet Marking Probability vs. Percent of Paths Found.

reconstruct the attack path. The packet marking probability (m) and the maximum number of packets (n) sent by the attacker may be altered to observe the effects that these parameters have on path reconstruction. PPM techniques insert traceback data into packets with a prescribed probability of occurrence (m) as they are routed through the network. Theilmann [26] has shown that path lengths (d) in the Internet seldom exceed 25 hops. A packet marking frequency m ≤ 1/d is considered optimal [8]. Thus, we will use m = 1/25 (4%) for the marking frequency. To explore the relationship between m and the degree of successful path reconstruction, simulations were run holding n and d constant (700 and 29, respectively) while varying m from 0% to 100% (Figure 8). We were able to reconstruct ≥ 99% of the paths for m = 2%, and 6 - 8% (1/50 and 1/16.67 - 1/12.5) and 100% of the paths for m = 3 - 5% (1/33.3 – 1/20). If the

chosen marking probability is too low an insufficient number of packets will be received from routers closest to the victim. Conversely, if the chosen marking probability is too high an insufficient number of packets will be received from routers closest to the attacker, as downstream routers will overwrite previously marked packets. A main drawback of PPM techniques is the number of false positive paths generated during path reconstruction, especially as the number of simultaneous attacks increases during DDoS attacks [9]. Our main aim was to optimize the ability to accurately detect 100% of the paths while minimizing the number of false positive paths generated. The number of false positive paths generated for 5067 simultaneous attackers was: Method 1, estimated to be >600 000; Method 2, 2425 (~ 48%); Method 3, 22 (0.43%) (Figure 7, n = 700, m = 4%). Method 3 was able to predict 100% of the attack paths

149

Hassan Aljifri, Marcel Smets and Alexander Pons IP Traceback Using Header Compression

Figure 9: Path Reconstruction Response Surface.

after receiving approximately 700 packets for path lengths of 29 hops and as few as 300 packets for path lengths of four hops. Figure 9 shows the relationship between the number of packets (n), degree of path reconstruction and path length (d) for SNITCH using XOR plus Bit Rotation (m = 4%). As with other traceback methodologies, including packet marking, several limitations exist. Reordering or loss of packets may occur, thus making it necessary to receive an increased volume of packets for accurate attack path reconstruction. Traceback procedures would have to be modified if multipath routing becomes prevalent as the attack path would not remain stable long enough to generate sufficient tracing data. Another shortcoming is the inability to handle indirection/reflection (compromised slave hosts/zombies) and concealment (chains of false accounts). The methods we describe do not insert traceback data into fragmented packets as RFC 2507 does not compress the headers of such packets. Since

150

traceback data is generated by the router closest to the attacker and not the host computer, some degree of ISP or organizational cooperation would be required to identify the actual attack source. The identification of the attacker host may implicate a particular individual or may be so publicly available as to make an association impossible [5]. Depending on the desired outcome, halting the attack vs prosecution, this may be sufficient.

5 Conclusion and future work We have developed three variations of the SNITCH PPM concept that will utilize the space made available by IP header compression. Each successive method refines the previous approach by decreasing the number of false positive paths that arise during attack path reconstruction. Though each method increases in complexity the ultimate result is in an efficient, fast, reliable, deployable and practical solution to the IP traceback problem. Currently we are focusing on how to utilize the remaining

Hassan Aljifri, Marcel Smets and Alexander Pons IP Traceback Using Header Compression

space available in the compressed headers to eliminate false positives and further reduce the number of packets required for accurate IP traceback.

References [1] Computer Security Institute with participation from the San Francisco FBI Computer Intrusion Squad. 2002 CSI/FBI Computer Crime and Security Survey. [2] Houle, K.J. and Weaver, G.M., 2001. Trends in Denial of Service Attack Technology, CERT Coordination Center, Carnegie Mellon University, October 2001.

[13]Denial of Service Attacks. CERT Coordination Center, Carnegie Mellon University, Tech Tips, Initial release 1997, Updated June 2001. [14]Ferguson, P. and Senie, D., 2000. Network Ingress Filtering: Defeating Denial of Service Attacks Which Employ IP Source Address Spoofing. RFC 2827, IETF, Network Working Group, Category: Best Current Practice, May 2000. [15]Cisco Systems Inc. Using CAR (Committed Access Rate) During DOS Attacks, 2001. [16]Stone, R., 2000. An IP Overlay Network for Tracking DoS Floods. Proceedings of the 9th Usenix Security Symposium, Denver, CO, USA, 2000.

[3] Levine, D. and Kessler, G., 2002. Denial of Service Attacks. Computer and Security Handbook, Wiley, 2002.

[17]Burch, H. and Cheswick, B., 2000. Tracing Anonymous Packets to their Approximate Source. Proceedings of the 14th Conference on Systems Administration, 2000 LISA XIV, New Orleans, Louisiana, USA, 2000.

[4] Bellovin, S.M., 1989. Security Problems in the TCP/IP Protocol Suite. ACM Computer Communications Review, Vol. 19(2), 1989, pp. 32-48.

[18]Schnackenberg, D., Djahandari, K. and Sterne, D., 2000. Infrastructure for Intrusion Detection and Response. Proceedings for DISCEX, January 2000.

[5] Lee, S.C. and Shields, C., 2001. Tracing the Source of Network Attack: A Technical, Legal and Societal Problem. Proc. of the 2001 IEEE Workshop on Information Assurance and Security, United States Military Academy, West Point, New York, June 2001.

[19]Schnackenberg, D., Djahandari, K., Sterne, D., Holiday, H. and Smith, R., 2001. Cooperative Intrusion Traceback and Response Architecture (CITRA). Proceedings of the 2nd DARPA Information Survivability Conference and Exposition, June 2001.

[6] Dean, D., Franklin, M. and Stubblefield, A., 2002. An Algebraic Approach to IP Traceback. ACM Transactions on Information and System Security, Vol. 5, 2002, pp.119137.

[20]Snoeren, A.C., Partridge, C., Sanchez, L.A., Jones, C.E., Tchakoutio, F., Kent, S.T. and Strayer, S.T., 2001. HashBased IP Traceback. Proceedings of ACM SIGCOMM 2001, August 2001.

[7] Lee, W. and Park, K., 2001. On the Effectiveness of Probabilistic Packet Marking for IP Traceback under Denial of Service Attack. Proceedings of the IEEE INFOCOM01, Anchorage, AK, USA, 2001.

[21]Bellovin, S., Leech, M. and Taylor, T., 2001. ICMP Traceback Messages. Internet Draft, IETF, October 2001.

[8] Savage, S., Wetherall, D., Karlin, A. and Anderson, T., 2001. Network Support for IP Traceback. IEEE/ACM Transactions on Networking, Vol. 9(3), 2001, pp. 226-237. [9] Song, D. and Perrig, A., 2001. Advanced and Authenticated Marking Schemes for IP Traceback. Proceedings of the IEEE INFOCOM01, Anchorage, AK, USA, 2001. [10]Degermark, M., Nordgren, B. and Pink, S., 1999. IP Header Compression. RFC 2507, IETF, Network Working Group, Category: Standards Track, February 1999. [11]Jacobson, V., 1990. Compressing TCP/IP Headers for LowSpeed Serial Links. RFC 1144, IETF, Network Working Group, February 1990.

[22]Wu, S.F., Zhang, L., Massey, D. and Mankin, A., 2001. Intention-Driven ICMP Traceback. Internet Draft, IETF, February 2001. [23]Paxson, V., 1997. End-to-End Routing Behavior in the Internet. IEEE/ACM Transactions on Networking, Vol. 5(5), 1997, pp. 601-615. [24]Reuters Ltd., London. Cisco Gaining Share In Routers, Switches. The Mercury News, posted on 16 May 2002. [25]Internet Mapping Project, http://cm.belllabs.com/who/ches/map/dbs/index.html. [26]Theilmann, W. and Rothermel, K., 2000. Dynamic Distance Maps of the Internet. Proceedings of IEEE INFOCOM 2000, Tel Aviv, Israel, 2000.

[12]Baba, T. and Matsuda, S., 2002. Tracing Network Attacks to Their Sources. IEEE Internet Computing, March/April 2002, pp. 20-26.

151