340 likes | 448 Views
Traceback. Pat Burke Yanos Saravanos. Agenda . Introduction Problem Definition Benchmarks and Metrics Traceback Methods Packet Marking Hash-based Conclusion References. Why Use Traceback?. General Network Monitoring Check users on FTP server Network Threats SPAM DoS
E N D
Traceback Pat Burke Yanos Saravanos
Agenda • Introduction • Problem Definition • Benchmarks and Metrics • Traceback Methods • Packet Marking • Hash-based • Conclusion • References
Why Use Traceback? • General Network Monitoring • Check users on FTP server • Network Threats • SPAM • DoS • Insider attacks
Why Use Traceback? • Network Threats • Worms / Viruses • Code Red (2001) spreading at 8 hosts/sec • Slammer Worm (2003) spreading at 125 hosts/sec • Illegal file sharing
Why Use Traceback? • Currently very difficult to find spammers, virus authors • Easy to spoof IPs • No inherent tracing mechanism in IP • Blaster virus author left clues in code, was eventually caught • What if we could trace packets back to point of origin?
Packet Tracing • Monitoring applications currently exist • Ethereal, tcpdump, ngrep, etc • Only work with untampered packets • Worms, viruses, spam are sent with spoofed IPs from compromised computers • Need solutions to trace all packets
Preliminary Solutions • Routers add identifiers to the packet as it moves along the Internet • Packet size increases with every hop • Effective throughput decreases very quickly • Routers keep a log of all the packets that have been routed • Large overhead required of all routers • Huge database containing packet information • When should you clear packet information?
Benchmarks • Effect on throughput • Amount of overhead added to the packets • False positive rate • Percentage of paths traced back to the incorrect source • Computational intensity • Time required to trace an attack • Amount of data required to trace an attack • CPU/memory usage on router
Benchmarks • Traceback’s effect on network • Does it flood? • Susceptibility to spoofing • Collisions • For hash-based traceback methods
Some Assumptions • Attackers can create/spoof any packet • Packets from an attack may take different routes to victim • Attacker-victim routes are stable • Routers are not compromised
Packet Marking • Add information to the packets so that paths can be retraced to original source • Methods for marking packets • Probabilistic • Node Marking • Edge Marking • Deterministic
Probabilistic Packet Marking (PPM) • Using probability, router marks a packet • With router IP address (node marking) • With edge of paths (edge marking) • Node marking • 95% accuracy, requires ~300,000 packets • Edge marking • More state information required, converges much faster
PPM Nodes • Each router writes its address in a 32-bit field only with probability p • Address field can be overwritten by routers closer to the victim • Probability of seeing the mark of a router d hops away is p(1-p)d-1 • Need many packets before we see a mark from a distant router
PPM Nodes – Pros • Not every packet is marked • Lower overhead on routers • Higher throughput (packet size remains small) • Fixed space is required for the packets • Packet size + 32 bits
PPM Nodes - Cons • Large number of false positives • DDoS with 25 hosts requires several days and has thousands of false positives • Slow convergence rate • For 95% success, we need 300,000 packets • Attacker can still inject modified packets into PPM network (mark spoofing) • This is only for a single attacker
PPM Edge Sampling • Reserve distance field and two 32-bit address fields (“start” and “end”) • If router decides to mark a packet, writes its address in “start” field and zeroes the distance field • When a router sees a zero in the distance field, it writes its address in the “end” field • If a router decides not to mark a packet, increments distance field • Must use saturating addition (distance field has limit)
PPM Edge Sampling • Max packets to reconstruct an attack is ln(d)/p(1-p)d-1 • Requires fewer packets than when marking nodes • Edge sampling allows reconstruction of the whole attack tree • Packets have additional overhead • Encoding start, end, and distance eliminates compatibility with networks not using PPM
Deterministic Packet Marking (DPM) • Every packet is marked • Spoofed marks are overwritten with correct marks
DPM • Incoming packets are marked • Outgoing packets are unaltered • Requires more overhead than PPM • Less computation required • Probability of generating ingress IP address (1-p)d-1
DPM • 32-bit address is split into two fields (0-15 and 16-31) and a flag • IP populates one of the two fields with probability of 0.5 • Set flag to 1 if using the higher end bits • Only part of the address is available to the attacker • Can be made more secure by using non-uniform probability distributions
DPM • Claimed to have 0 false positives • Claimed to converge very quickly • 99% probability of success with 7 packets • 99.9% probability of success with only 10 packets • Has not been tested on large networks • Cannot deal with NAT
HASH-BASED TRACEBACK Source Path Isolation Engine (SPIE)
SPIE - Overview • Each router along a packet’s transmission path computes a set of Hash-codes (digests) associated with each packet • The time-taggeddigests are stored in router-memory for some time period • Limited by available router resources • Traceback is initiated only by “authenticated agent requests” to the SPIE Traceback Manager (STM) • Executed by means of a broadcast message • Results in the construction of a complete attack graph within the STM
SPIE - Assumptions • Packets may be addressed to multiple destinations • Attackers are aware they are being traced • Routers may be subverted, but not often • Routing within the network may be unstable • Traceback must deal with divergent paths • Packet size should not grow as a result of traceback • 1 byte increase in size = 1% increase in resource use • Very controversial … self-enabling assumption • End hosts may be resource constrained • Traceback is an infrequent operation • Broadcast messages can have a significant impact on internet performance • Traceback should return entire path, not just source
SPIE - Architecture DGA (Data Generation Agent) Resident in SPIE-enhanced routers to produce digests and store them in time-stamped digest tables. Implemented as software agents, interface cards, or dedicated aux boxes STM (SPIE Traceback Manager) Controls the SPIE system. Verifies authenticity of a traceback request, dispatches the request to the appropriate SCAR’s, gathers regional attack graphs, and assembles the complete attack graph. SCAR (SPIE Collection and Reduction Agents) Data concentration point for some regional area. When traceback is requested, SCAR’s initiate a broadcast request for traceback and produce regional attack graphs based upon data from constituent DGA’s
SPIE - Hashing LAN .139% WAN .00092% Masked (gray) areas are NOT used in hash-code calculation • Multiple hash-codes (hash-codes, different groupings of fields) are calculated for each package based on 24 relatively invariant fields of the first 32 bytes of each packet. • Packet was received if all hashes are positive • Hash functions can be simple (no cryptographic hardness required) and relatively fast
SPIE – Implementation Issues • PRO • Single packet tracing is feasible • Automated processing by SPIE-enhanced routers make spoofing difficult, at best • Relatively low storage required • Only digests and time are stored • Does not aid in eavesdropping of payload data • Payload is not stored • CON • Requires specially configured (SPIE-enhanced) routers. • Probability of detection is directly related to the number of available SPIE-enhanced routers in the network in question • Storage in routers is a limiting factor in the window of time in which a packet may be successfully traced • May consider some sort of filtering of packets to be digested • May have the appearance of a loss of anonymity across the Internet
Conclusions • DoS, worms, viruses continuously becoming more dangerous • Attacks must be shut down quickly and be traceable • Integrating traceback into next generation Internet is critical
Conclusions • Probabilistic Packet Marking • Keeps low packet overhead • Not 100% accurate, traceback is slow • Deterministic Packet Marking • No false positives • Much higher packet overhead, needs more testing • Hash-based Traceback • No packet overhead • New, more capable routers
Conclusions • Cooperation is required • Routers must be built to handle new tracing protocols • ISPs must provide compliance with protocols • Internet is no longer anonymous • Some issues must still be solved • NATs • Collisions
References • Belenky, A., Ansari, N. “IP Traceback with Deterministic Packet Marking”. IEEE Communications Letter, April 2003. • Savage, S., et al. “Practical Network Support for IP Traceback”. Department of Computer Science, University of Washington. • Snoeren, A., Partridge, Craig, et al. “Single-Packet IP Traceback”. IEEE/ACM Transactions on Networking, December 2002.