Hash-Based IP Traceback

Hash-Based IP Traceback Alex C. Snoeren+, Craig Partridge, Luis A. Sanchez++, Christine E. Jones, Fabrice Tchakountio, Stephen T. Kent and W. Timothy Strayer BBN Technologies +MIT Laboratories ++Megisto Systems Published SIGCOMM 2001

Authors have unknowingly contributed slides to this presentation 

DOS Attacks! • CSI/FBI 2001 Computer Crime Report • 61 % IDS, 95% firewalls (sample 530) • 36% detected DOS attacks (sample 538) • 27.6% financial loss (sample 344) • GRC.com • 8 days of attacks • UDP fragmentation/ICMP flood attacks on 2 * T1 connections • 474 Windows PC’s, coordinated • 2.4 billion packets!

Who is attacking? • IP Traceback • Trace the path of IP packet(s) to their source • Why is this difficult? • IP networks are stateless • Spoofed source addresses • Many administration layers

Approach: Log-Based Traceback R R A R R R R7 R R4 R5 R6 R R3 R1 R2 V

Logging Challenges • Attack path reconstruction is difficult • Packet may be transformed as it moves through the network • Full packet storage is problematic • Memory requirements are prohibitive at high line speeds (OC-192 is ~10Mpkt/sec) • Extensive packet logs are a privacy risk • Traffic repositories may aid eavesdroppers

Source Path Isolation Engine Goals • Trace a single IP packet back to source • Asymmetric attacks (e.g. Fraggle, Teardrop, ping-of-death) • Minimal cost (resource usage) • Maintain privacy (prevent eavesdropping) • Robustness (min. false pos., no false neg.)

Assumptions • Network: • Packets can be addressed to 1+ hosts (multicast, broadcast) • Duplicate packets may exist in network • Router infrastructure is unstable • End hosts have restrained network resources • Attacker: • Aware of Traceback mechanisms • Routers may be subverted • Mechanism: • Packet size should not grow due to Traceback • Traceback is infrequent?

Goals • Find attack graph for single packet • Minimal cost (resource usage) • Maintain privacy (prevent eavesdropping) • Robustness (min. false pos., no false neg.)

SPIE Architecture • DGA: Data Generation Agent • computes and stores digests of each packet on forwarding path. • Deploy 1 DGA per router • SCAR: SPIE Collection and Reduction agent • Long term storage for needed packet digests • Assembles attack graph for local topology • STM: SPIE Traceback Manager • Interfaces with IDS • Verifies integrity and authenticity of Traceback call • Sends requests to SCAR for local graphs • Assembles attack graph from SCAR input

IDS STM SCAR SCAR DGA DGA DGA DGA Router Router Router Router DGA/Router DGA/Router 1: IDS identifies attack packet 9: Send attack graph to IDS 2: Sends Packet, Time, Last Hop 3: Authenticates and verifies IDS request 8: Assemble local graphs, query for missing info 4: Provisions SCAR’s to collect local DGA digests 7: Collect SCAR local graphs 6: Identify routers with Packet’s digest and construct graph 5: Collect digest tables, time intervals, hash functions

Data Generation Agents • Compute “packet digest” • Store in Bloom filter • Flush filter every time interval, t

Packet Digests • Compute hash(p) • Invariant fields of p only • 28 bytes hash input, 0.00092% WAN collision rate • Fixed sized hash output, n-bits • Compute k independent digests • Increased robustness • Reduced collisions, reduced false positive rate

Hash input: Invariant Content Ver HLen TOS Total Length Identification D F M F Fragment Offset TTL Protocol Checksum 28 bytes Source Address Destination Address Options First 8 bytes of Payload Remainder of Payload

Hashing Properties • Each hash function • Uniform distribution of input -> output H1(x) = H1(y) for some x,y -> unlikely • Use k independent hash functions • Collisions among k functions independent • H1(x) = H2(y) for some x,y -> unlikely • Cycle k functions every time interval, t

1 H1(P) H2(P) H3(P) 1 . . . 1 Hk(P) Digest Storage: Bloom Filters • Fixed structure size • Uses 2n bit array • Initialized to zeros • Insertion • Use n-bit digest as indices into bit array • Set to ‘1’ • Membership • Compute k digests, d1, d2, etc… • If (filter[di]=1) for all i, router forwarded packet n bits 1 H(P) 2n bits

Router Resource TradeoffsMaintain same False Positive Rate • 4 variables: n, k, b, t • Small n, larger k, smaller t (limited Memory) • Large n, smaller k, larger t (limited CPU) • Small b, larger t (limited Bandwidth) • k -> n2, t -> 0 • n = memory, need n2 storage • k = CPU, need k*h(p) at line rates • b = bandwidth, number of neighbors • t = saturation rate, application responsiveness

SPIE Collection and Reduction Agent • Polls DGA’s for digest tables, hash functions, time intervals • Time critical operation • Constructs local attack graph • Reverse Path Flooding • For each router, • Compute k * hashes of p with local hash functions • Membership test ( table[hi (p)]==1 for all i) • Sends Result to STM

SPIE Traceback Manager • Interface to IDS System • Receives attack signature for p • Returns attack graph • Authenticates/Verifies (no details) • Provisions SCAR’s • Send(packet, last hop router, arrival time) • Assembles local graph • Fills holes in graph

SPIE Performance • Local false positive rate (n, k,b) • Length of time digests are stored (t) • IDS->STM->SCAR->DGA • Accuracy of attack graphs • Derived from local false positive rates • Type of traffic, WAN = 0.00092%, LAN = 0.139% • Network topology • Why?

Fuzziness and assumptions!

Simulation Setup • ISP backbone, 70 routers, T-1 to OC-3 links • Avg. link utilization, topology for 1 week • Randomly selected attacker, victim • Send 1000 packets • 5000 sample size • Background traffic same • P = false positive rate

Degree-Independent Simulation Results 1 1 1 1 Random Graph Real ISP, 100% Utilization Real ISP, Actual Utilization 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 Expected Number of False Positives 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0 0 0 0 0 0 0 0 5 5 5 5 10 10 10 10 15 15 15 15 20 20 20 20 25 25 25 25 30 30 30 30 Length of Attack Path (in hops) Length of Attack Path (in hops) Length of Attack Path (in hops) Length of Attack Path (in hops)

Conclusion • Find attack graph for single packet • Log every packet at every router • Minimal cost (resource usage) • Store fixed-sized hash(p), not p • 0.05% link bandwidth per time • Distribute graph creation (attack sub-graphs) • Maintain privacy (prevent eavesdropping) • Authenticate Traceback (IDS-> STM call) • No header fields stored • Robustness (min. false pos., no false neg.)?

Food for Thought • How important is privacy of IP packets? • Anyone with network access along the path can sniff packets • What about false negatives? • Communication latency? • Problems with small packet flows • More computation at end host -> longer detection cycle • Identify attack signature? • 28 bytes enough? • Flooding attacks cause higher false pos

Hash-Based IP Traceback