Hash-Based IP Traceback

Hash-Based IP Traceback Alex C. Snoeren, Craig Partidge, Luis A. Sanchez, Christine E. Jones, Fabrice Tchakountio, Stephen T. Kent, and W. Timothy Strayer SigComm Aug. 2001 San Diego, Ca Presented by Chris Dion

Tonight’s Outline • Introduction to the problem • What is IP Traceback? • Some Previous Work • Overview of the Proposed Solution • Implementation/Simulation

Internet Anonymity • Not all attacks are large flooding DOS attacks • Well placed single packet attacks can be just as effective • These packets can be spoofed to appear from almost anywhere • How can we track these attacks and find their origin?

Current Methods • Use of ingress filtering to limit source address • Not all routers can look at every packets source address • Spoofed addresses are all to often found • NAT • Mobile IP • Hybrid satellite architectures

IP Traceback • Some Assumptions about the network • Packets may be Multi- or broadcast • Tracing system must be prepared for multiple packets • Attackers can get into routers • Tracing must not be confounded by a motivated attacker • Routing behavior of network can be unstable • Tracing must be prepared to handle divergent information • Packet Size Should not grow due to Tracing • End hosts may be resource constrained • Tracing is an infrequent operation • Can use routers control path vs. data path

Attack packet #2 Attack packet #1 Possible Compromised Routers Victim Attack Path

Packet Transformations • Packets may be modified for number of valid reasons • Packet fragmentation • IP option processing • ICMP processing • Packet duplication • NAT • IPsec Tunneling • Less then 3% of Internet traffic in 2000 • Attackers can use these!

Some Previous work • 2 approaches to determining route: • Audit of flow as it traverses network • Can grow packet with route information, use fields in header, or use out-of-band signaling • Inference of flow based on its impact on state of network • Systematically floods network and watch for variations in received packet flow • Becomes infeasible when flow sizes approach a single packet

Packet Digests • We do not need the entire packet • Reduces storage requirements • Need only packet header to determine attacker • Still need to uniquely determine packet • Security concerns • Mask out fields that modify along a packets route: • Type of Service • TTL • Checksum • IP Options

IP Packet fields for Hash Input

Why 28 bytes? • WAN trace from OC-3 gateway router • LAN trace from active 100Mb segment • For 28 bytes • .00092% WAN • .139 % LAN

Bloom filters • Used to store digests in router • From Communications of ACM July 1970 • Computes k distinct packet digests for each packet using hash functions • Uses results to index into a bit array • Could potentially create false positives

Bloom filter K bit hash functions n bit digests for each packet received

Bloom Filters (cont) • Restrictions on Hash Family • Must distribute a high correlated set of inputs (packet digests) • Independent collision events (false positives at one router is independent of neighboring routers) • Called universal hash families • Must be easy to compute at high link speeds

Source Path Isolation Engine

SPIE System • DGA – Data Generation Agent • Produces packet digests of each departing packet and stores them in a digest table • Represents the traffic forwarded in a given time interval • SCAR – SPIE Collection and Reduction Agent • When attack is detected, SCAR product attack graph for it’s region • STM- SPIE Traceback Manager • Interface to the intrusion detection system • Gathers complete attack graph

Traceback processing • IDS will signal potential attack and give STM: • Packet P • Victim V, must be expressed in terms of the last-hop routers • Time of attack T, must be in a timely fashion • STM immediately asks all SCARs in domain to poll DGAs for digests • SCAR will give Attack graph, then STM will work backwards to identify source

What if Packet is Transformed? • Need a TLT – Transform Lookup Table with each packet digest: IP Packet Digest Indirect flag Type of Transform (ICMP, NAT, etc.) Variable for Packet Data needed to transform

Graph Construction • Each SCAR is responsible for it’s region • After gathering all digest tables, simulates reverse-path flooding (RPF) • If packet is found in router, node is marked and arrival time is the latest possible time to search

Graph Construction Example Attack Paths SPIE Queries

Implementation • Universal hash family is simulated using MD5 Hashing (128-bit output) • Random number is pre-pended to each packet for independency • Output is taken as 4 32-bit digests • Size of Digest Table varies with the total traffic capacity of the router

Possible DGA in hardware

False Positive Analysis • Use probability of false positives at p=1/8d for a theoretical limit (d=degree of router’s neighbors) • Assuming 32 node path length, approaching diameter of the Internet • For simulation used topology for a major ISP • 70 backbone routers with T-1 (1.54 Mbps) to OC-3 (155 Mbps) • Sent 1000 attack packets at a constant rate to one victim, with background traffic set to a fixed false-positive rate P

Simulation Result • Low value was due to link utilizations • Considerable Gap between theoretical and simulation

Time and Memory Analysis • Give one minute to identify attack packet • Memory will be linear with link capacity • We will consider Bloom filter with 3 digesting functions and a capacity factor of 5 for a false positive rate of P = .092 when full • Average sized packets (1000 bits) • Using this we get a rule of thumb • SPIE requires 0.5% of total link capacity

Time and Memory Analysis (cont) • 4 OC-3 links = 47 MB of storage • 32 OC-192 links = 23.4GB for one minute • Access Time is also important • Given DRAM cycle time of 50ns, routers processing more then 1 OC-192 will need SRAM (only 16Mb which must be paged)

Some Issues • Traceback may be requested when the network is unstable • Possibly from the attack itself • Best solution would be out-of-band management • Priority handling may work for in-band • ISP-ISP deployment • Possible sharing of SPIE infrastructure? • Grant STM requests to other domains

Conclusions • Traceback of a single packet is very difficult • SIPE’s key contribution is that it is feasible • Low Storage • Does not aid in eavesdropping • Complete System • The future could discard packet digests probabilistically as they age to allow for longer traceback times

Hash-Based IP Traceback