790 likes | 923 Views
Automated Worm Fingerprinting. Sumeet Singh, Cristian Estan, George Varghese, and Stefan Savage. Introduction. Problem: how to react quickly to worms? CodeRed 2001 Infected ~360,000 hosts Sapphire/Slammer (376 bytes) 2002 Infected ~2Mn hosts.
E N D
Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese, and Stefan Savage
Introduction • Problem: how to react quickly to worms? • CodeRed 2001 • Infected ~360,000 hosts • Sapphire/Slammer (376 bytes) 2002 • Infected ~2Mn hosts
The SQL Slammer Worm:30 Minutes After “Release” - Infections doubled every 8.5 seconds - Spread 100X faster than Code Red - At peak, scanned 55 million hosts per second. This information is from the Cooperative Association for Internet Data Analysis and the University of California at San Diego.
Network Effects Of The SQL Slammer Worm • At the height of infections • Several ISPs noted significant bandwidth consumption at peering points • Average packet loss approached 20% • South Korea lost almost all Internet service for period of time • Financial ATMs were affected • Some airline ticketing systems overwhelmed
Existing Approaches • Detection • Ad hoc intrusion detection • Characterization • Manual signature extraction • Isolates and decompiles a new worm • Look for and test unique signatures • Can take hours or days
Earlybird – Content Sifting • Automatically detect and contain new worms • Two observations • Content Invariance - Some portion of the content in existing worms is invariant • Address Dispersion - Rare to see the same string recurring from many sources to many destinations
Content Invariance R A N D A B C D O M P1 P2 R A B C D A D C D E
Worm Detection • Three classes of methods • Scan detection • Honeypots • Behavioral techniques
Scan Detection • Most worms select targets using a random process • May target unused address space
Network Telescopes • Can be used to detect large-scale widespread attacks on the internet
Network Telescopes • Can be used to detect large-scale widespread attacks on the internet
Scan Detection - Limitations • What if worms spread in a non-random fashion. • Passive system – cannot identify worm signature.
Honeypots • Monitored idle hosts with untreated vulnerabilities • Used to isolate worms
Honeypots - Limitations • Limitations • Manual extraction of signatures • What if honeypot is not infected?
Behavioral Detection • Monitor and analyze the internals of a computing system • Which program should have access to which resources • Looks at the state of the system • Can detect slow moving worms
Behavioral Detection - Limitations • Needs application-specific knowledge • Cannot infer a large-scale outbreak
Containment • Host quarantine • Preventing an infected host from talking to other hosts • String-matching • Connection throttling • On all outgoing connections
Defining Worm Behavior • Content invariance • Portions of a worm are invariant (e.g. the decryption routine) • Content prevalence • Appears frequently on the network • Address dispersion • The number of distinct hosts in a worm grows over time
The basic algorithm • Traffic pattern is sufficient for detecting worms • Relatively straightforward • Extract all possible substrings • Raise an alarm when • FrequencyCounter[substring] > threshold1 • SourceCounter[substring] > threshold2 • DestCounter[substring] > threshold3
Detector in network B A C cnn.com E D Address Dispersion Table Sources Destinations Prevalence Table The basic algorithm
Detector in network B A C cnn.com E D Address Dispersion Table Sources Destinations Prevalence Table 1 1 (A) 1 (B) The basic algorithm
Detector in network B A C cnn.com E D Address Dispersion Table Sources Destinations Prevalence Table 1 1 (A) 1 (B) 1 1 (C) 1 (A) The basic algorithm
Detector in network B A C cnn.com E D Address Dispersion Table Sources Destinations Prevalence Table 2 2 (A,B) 2 (B,D) 1 1 (C) 1 (A) The basic algorithm
Detector in network B A C cnn.com E D Address Dispersion Table Sources Destinations Prevalence Table 3 3 (A,B,D) 3 (B,D,E) 1 1 (C) 1 (A) The basic algorithm
Practical Content Sifting • Characteristics • Small processing requirements • Small memory requirements
Detecting Common Strings • Cannot afford to detect all substrings • Maybe can afford to detect all strings with a small fixed length An elephant is an elephant, of course, of course
Which substrings to index? • Approach 1:Index all substrings • Way too many substrings too much computation too much state • Approach 2:Index whole packet • Very fast but trivially evadable (e.g., Witty, Email Viruses) • Approach 3: Index all contiguous substrings of a fixed length ‘S’ • Can capture all signatures of length ‘S’ and larger A B C D E F G H I J K
Estimating Content Prevalence • Finding the packet payloads that appear at least x times among the N packets sent. • Use hash keys to index the table, rather than the whole payload.
Multistage filters stream memory Array of counters Hash(Pink) [Singh et al. 2002]
Multistage filters packet memory Array of counters Hash(Green)
Multistage filters packet memory Array of counters Hash(Green)
Multistage filters packet memory
Multistage filters packet memory
Multistage filters Reached threshold packet memory packet1 1 Insert
Multistage filters packet memory packet1 1
Multistage filters packet memory packet1 1 packet2 1
Stage 2 Multistage Filters packet memory Stage 1 packet1 1 No false negatives! (guaranteed detection)
Conservative Updates Gray = all prior packets
Redundant Redundant Conservative Updates
Estimating Address Dispersion • Not sufficient to count the number of source and destination pairs • e.g. send a mail to a mailing list • Two sources—mail server and the sender • Many destinations • Need to track the distinct source and destination IP addresses • For each substring
Bitmap counting – direct bitmap Set bits in the bitmap using hash of the flow ID of incoming packets HASH(green)=10001001 [Estan et al. 2003]
Bitmap counting – direct bitmap Different flows have different hash values HASH(blue)=00100100
Bitmap counting – direct bitmap Packets from the same flow always hash to the same bit HASH(green)=10001001
Bitmap counting – direct bitmap HASH(violet)=10010101
Bitmap counting – direct bitmap HASH(orange)=11110011
Bitmap counting – direct bitmap HASH(pink)=11100000