1 / 78

Automated Worm Fingerprinting

Automated Worm Fingerprinting. Sumeet Singh, Cristian Estan, George Varghese, and Stefan Savage. Introduction. Problem: how to react quickly to worms? CodeRed 2001 Infected ~360,000 hosts Sapphire/Slammer (376 bytes) 2002 Infected ~2Mn hosts.

Download Presentation

Automated Worm Fingerprinting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese, and Stefan Savage

  2. Introduction • Problem: how to react quickly to worms? • CodeRed 2001 • Infected ~360,000 hosts • Sapphire/Slammer (376 bytes) 2002 • Infected ~2Mn hosts

  3. The SQL Slammer Worm:30 Minutes After “Release” - Infections doubled every 8.5 seconds - Spread 100X faster than Code Red - At peak, scanned 55 million hosts per second. This information is from the Cooperative Association for Internet Data Analysis and the University of California at San Diego.

  4. Network Effects Of The SQL Slammer Worm • At the height of infections • Several ISPs noted significant bandwidth consumption at peering points • Average packet loss approached 20% • South Korea lost almost all Internet service for period of time • Financial ATMs were affected • Some airline ticketing systems overwhelmed

  5. Existing Approaches • Detection • Ad hoc intrusion detection • Characterization • Manual signature extraction • Isolates and decompiles a new worm • Look for and test unique signatures • Can take hours or days

  6. Earlybird – Content Sifting • Automatically detect and contain new worms • Two observations • Content Invariance - Some portion of the content in existing worms is invariant • Address Dispersion - Rare to see the same string recurring from many sources to many destinations

  7. Content Invariance R A N D A B C D O M P1 P2 R A B C D A D C D E

  8. Address Dispersion

  9. The menace

  10. Worm Detection • Three classes of methods • Scan detection • Honeypots • Behavioral techniques

  11. Scan Detection • Most worms select targets using a random process • May target unused address space

  12. Network Telescopes • Can be used to detect large-scale widespread attacks on the internet

  13. Network Telescopes • Can be used to detect large-scale widespread attacks on the internet

  14. Scan Detection - Limitations • What if worms spread in a non-random fashion. • Passive system – cannot identify worm signature.

  15. Honeypots • Monitored idle hosts with untreated vulnerabilities • Used to isolate worms

  16. Honeypots - Limitations • Limitations • Manual extraction of signatures • What if honeypot is not infected?

  17. Behavioral Detection • Monitor and analyze the internals of a computing system • Which program should have access to which resources • Looks at the state of the system • Can detect slow moving worms

  18. Behavioral Detection - Limitations • Needs application-specific knowledge • Cannot infer a large-scale outbreak

  19. Containment • Host quarantine • Preventing an infected host from talking to other hosts • String-matching • Connection throttling • On all outgoing connections

  20. Defining Worm Behavior • Content invariance • Portions of a worm are invariant (e.g. the decryption routine) • Content prevalence • Appears frequently on the network • Address dispersion • The number of distinct hosts in a worm grows over time

  21. The basic algorithm • Traffic pattern is sufficient for detecting worms • Relatively straightforward • Extract all possible substrings • Raise an alarm when • FrequencyCounter[substring] > threshold1 • SourceCounter[substring] > threshold2 • DestCounter[substring] > threshold3

  22. Detector in network B A C cnn.com E D Address Dispersion Table Sources Destinations Prevalence Table The basic algorithm

  23. Detector in network B A C cnn.com E D Address Dispersion Table Sources Destinations Prevalence Table 1 1 (A) 1 (B) The basic algorithm

  24. Detector in network B A C cnn.com E D Address Dispersion Table Sources Destinations Prevalence Table 1 1 (A) 1 (B) 1 1 (C) 1 (A) The basic algorithm

  25. Detector in network B A C cnn.com E D Address Dispersion Table Sources Destinations Prevalence Table 2 2 (A,B) 2 (B,D) 1 1 (C) 1 (A) The basic algorithm

  26. Detector in network B A C cnn.com E D Address Dispersion Table Sources Destinations Prevalence Table 3 3 (A,B,D) 3 (B,D,E) 1 1 (C) 1 (A) The basic algorithm

  27. The basic algorithm

  28. Practical Content Sifting • Characteristics • Small processing requirements • Small memory requirements

  29. Detecting Common Strings • Cannot afford to detect all substrings • Maybe can afford to detect all strings with a small fixed length An elephant is an elephant, of course, of course

  30. Which substrings to index? • Approach 1:Index all substrings • Way too many substrings  too much computation  too much state • Approach 2:Index whole packet • Very fast but trivially evadable (e.g., Witty, Email Viruses) • Approach 3: Index all contiguous substrings of a fixed length ‘S’ • Can capture all signatures of length ‘S’ and larger A B C D E F G H I J K

  31. Estimating Content Prevalence • Finding the packet payloads that appear at least x times among the N packets sent. • Use hash keys to index the table, rather than the whole payload.

  32. Multistage filters stream memory Array of counters Hash(Pink) [Singh et al. 2002]

  33. Multistage filters packet memory Array of counters Hash(Green)

  34. Multistage filters packet memory Array of counters Hash(Green)

  35. Multistage filters packet memory

  36. Multistage filters packet memory

  37. Multistage filters Reached threshold packet memory packet1 1 Insert

  38. Multistage filters packet memory packet1 1

  39. Multistage filters packet memory packet1 1 packet2 1

  40. Stage 2 Multistage Filters packet memory Stage 1 packet1 1 No false negatives! (guaranteed detection)

  41. Conservative Updates Gray = all prior packets

  42. Redundant Redundant Conservative Updates

  43. Conservative Updates

  44. Estimating Address Dispersion • Not sufficient to count the number of source and destination pairs • e.g. send a mail to a mailing list • Two sources—mail server and the sender • Many destinations • Need to track the distinct source and destination IP addresses • For each substring

  45. Bitmap counting – direct bitmap Set bits in the bitmap using hash of the flow ID of incoming packets HASH(green)=10001001 [Estan et al. 2003]

  46. Bitmap counting – direct bitmap Different flows have different hash values HASH(blue)=00100100

  47. Bitmap counting – direct bitmap Packets from the same flow always hash to the same bit HASH(green)=10001001

  48. Bitmap counting – direct bitmap HASH(violet)=10010101

  49. Bitmap counting – direct bitmap HASH(orange)=11110011

  50. Bitmap counting – direct bitmap HASH(pink)=11100000

More Related