1 / 88

Automated Worm Fingerprinting

This paper discusses the problem of quickly reacting to worms and introduces an automated worm fingerprinting approach called Earlybird. It extracts signatures of known worms and detects invariant content to contain new worms efficiently. The paper also explores existing worm detection and containment methods.

jlarry
Download Presentation

Automated Worm Fingerprinting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese, and Stefan Savage

  2. Introduction • Problem: how to react quickly to worms? • CodeRed 2001 • Infected ~360,000 hosts within 11 hours • Sapphire/Slammer (376 bytes) 2002 • Infected ~75,000 hosts within 10 minutes

  3. Existing Approaches • Detection • Ad hoc intrusion detection • Characterization • Manual signature extraction • Isolates and decompiles a new worm • Look for and test unique signatures • Can take hours or days

  4. Existing Approaches • Containment • Updates to anti-virus and network filtering products

  5. Earlybird • Automatically detect and contain new worms • Two observations • Some portion of the content in existing worms is invariant • Rare to see the same string recurring from many sources to many destinations

  6. Earlybird • Automatically extract the signature of all known worms • Also Blaster, MyDoom, and Kibuv.B hours or days before any public signatures were distributed • Few false positives

  7. Background and Related Work • Almost all IPs were scanned by Slammer < 10 minutes • Limited only by bandwidth constraints

  8. The SQL Slammer Worm:30 Minutes After “Release” - Infections doubled every 8.5 seconds - Spread 100X faster than Code Red - At peak, scanned 55 million hosts per second.

  9. Network Effects Of The SQL Slammer Worm • At the height of infections • Several ISPs noted significant bandwidth consumption at peering points • Average packet loss approached 20% • South Korea lost almost all Internet service for period of time • Financial ATMs were affected • Some airline ticketing systems overwhelmed

  10. Signature-Based Methods • Pretty effective if signatures can be generated quickly • For CodeRed, 60 minutes • For Slammer, 1 – 5 minutes

  11. Worm Detection • Three classes of methods • Scan detection • Honeypots • Behavioral techniques

  12. Scan Detection • Look for unusual frequency and distribution of address scanning • Limitations • Not suited to worms that spread in a non-random fashion (i.e. emails, IM, P2P apps) • Based on a target list • Spread topologically

  13. Scan Detection • More limitations • Detects infected sites • Does not produce a signature

  14. Honeypots • Monitored idle hosts with untreated vulnerabilities • Used to isolate worms • Limitations • Manual extraction of signatures • Depend on quick infections

  15. Behavioral Detection • Looks for unusual system call patterns • Sending a packet from the same buffer containing a received packet • Can detect slow moving worms • Limitations • Needs application-specific knowledge • Cannot infer a large-scale outbreak

  16. Characterization • Process of analyzing and identifying a new worm • Current approaches • Use a priori vulnerability signatures • Automated signature extraction

  17. Vulnerability Signatures • Example • Slammer Worm • UDP traffic on port 1434 that is longer than 100 bytes (buffer overflow) • Can be deployed before the outbreak • Can only be applied to well-known vulnerabilities

  18. Some Automated Signature Extraction Techniques • Allows viruses to infect decoy programs • Extracts the modified regions of the decoy • Uses heuristics to identify invariant code strings across infected instances

  19. Some Automated Signature Extraction Techniques • Limitation • Assumes the presence of a virus in a controlled environment

  20. Some Automated Signature Extraction Techniques • Honeycomb • Find longest common subsequences among sets of strings found in messages • Autograph • Uses network-level data to infer worm signatures • Limitations • Scale and full distributed deployments

  21. Containment • Mechanism used to deter the spread of an active worm • Host quarantine • Via IP ACLs on routers or firewalls • String-matching • Connection throttling • On all outgoing connections

  22. Host Quarantine • Preventing an infected host from talking to other hosts • Via IP ACLs on routers or firewalls

  23. Defining Worm Behavior • Content invariance • Portions of a worm are invariant (e.g. the decryption routine) • Content prevalence • Appears frequently on the network • Address dispersion • Distribution of destination addresses more uniform to spread fast

  24. Finding Worm Signatures • Traffic pattern is sufficient for detecting worms • Relatively straightforward • Extract substrings of all possible lengths • Raise an alarm when • FrequencyCounter[substring] > threshold1 • SourceCounter[substring] > threshold2 • DestCounter[substring] > threshold3

  25. Practical Content Sifting • Characteristics • Small processing requirements • Small memory requirements • Allows arbitrary deployment strategies

  26. Estimating Content Prevalence • Finding the packet payloads that appear at least x times among the N packets sent • During a given interval

  27. Estimating Content Prevalence • Table[payload] • 1 GB table filled in 10 seconds • Table[hash[payload]] • 1 GB table filled in 4 minutes • Tracking millions of ants to track a few elephants • Collisions...false positives

  28. Multistage Filters stream memory Array of counters Hash(Pink) [Singh et al. 2002]

  29. Multistage Filters packet memory Array of counters Hash(Green)

  30. Multistage Filters packet memory Array of counters Hash(Green)

  31. Multistage Filters packet memory

  32. Multistage Filters packet memory Collisions are OK

  33. Multistage Filters Reached threshold packet memory packet1 1 Insert

  34. Multistage Filters packet memory packet1 1

  35. Multistage Filters packet memory packet1 1 packet2 1

  36. Stage 2 Multistage Filters packet memory Stage 1 packet1 1 No false negatives! (guaranteed detection)

  37. Conservative Updates Gray = all prior packets

  38. Redundant Redundant Conservative Updates

  39. Conservative Updates

  40. Detecting Common Strings • Cannot afford to detect all substrings • Maybe can afford to detect all strings with a small fixed length

  41. Detecting Common Strings • Cannot afford to detect all substrings • Maybe can afford to detect all strings with a small fixed length A horse is a horse, of course, of course F1 = (c1p4 + c2p3 + c3p2 + c4p1 + c5) mod M

  42. Detecting Common Strings • Cannot afford to detect all substrings • Maybe can afford to detect all strings with a small fixed length A horse is a horse, of course, of course F2 = (c2p4 + c3p3 + c4p2 + c5p1 + c6) mod M F1 = (c1p4 + c2p3 + c3p2 + c4p1 + c5) mod M

  43. Detecting Common Strings • Cannot afford to detect all substrings • Maybe can afford to detect all strings with a small fixed length A horse is a horse, of course, of course F2 = (c2p4 + c3p3 + c4p2 + c5p1 + c6) mod M = (c1p5 + c2p4 + c3p3 + c4p2 + c5p1 + c6 - c1p5) mod M = (pF1 + c6 - c1p5) mod M F1 = (c1p4 + c2p3 + c3p2 + c4p1 + c5) mod M

  44. Detecting Common Strings • Cannot afford to detect all substrings • Maybe can afford to detect all strings with a small fixed length • Still too expensive…

  45. Estimating Address Dispersion • Not sufficient to count the number of source and destination pairs • e.g. send a mail to a mailing list • Two sources—mail server and the sender • Many destinations • Need to count the unique source and destination traffic flows • For each substring

  46. Bitmap counting – direct bitmap Set bits in the bitmap using hash of the flow ID of incoming packets HASH(green)=10001001 [Estan et al. 2003]

  47. Bitmap counting – direct bitmap Different flows have different hash values HASH(blue)=00100100

  48. Bitmap counting – direct bitmap Packets from the same flow always hash to the same bit HASH(green)=10001001

  49. Bitmap counting – direct bitmap Collisions OK, estimates compensate for them HASH(violet)=10010101

  50. Bitmap counting – direct bitmap HASH(orange)=11110011

More Related