Worms: Taxonomy and Detection

Worms: Taxonomy and Detection Mark Shaneck 2/6/2004

Outline • Introduction • Worm Classification • Spreading Media • Target Acquisition • Polymorphic Worms • Detection / Prevention • Conclusion

Introduction • Common and costly • So far, mostly benign… • Need to react within seconds - too quickly for a human

Spreading Media • Traditional • Email • Windows File Sharing • Hybrid

Traditional • Self-propagate through network • Exploit some vulnerability to automatically execute worm payload • Most common - buffer overflow • Least common in existence • Largest potential danger • Spreads fastest • Main subject of detection and containment research

Email • Spreads through email • Relies on humans or poor application design • Most are executable attachments • Nimda executed automatically when previewed • Most common form of worm • Very hard to detect, but they spread slowly

Windows File Sharing • Spreads through windows file shares • Worms don’t generally spread this way solely • Very hard to penetrate a network perimeter this way • Usually use other methods to penetrate network and then this method to spread within the network

Hybrid Worms • Combination of methods • Example: Nimda • Spread through email • Copied itself to open network shares (was executed if someone viewed it in Windows Explorer) • Traditional methods • Used subnet scanning to look for open Code Red II and Sadmind backdoors • Exploited multiple IIS Directory Traversal vulnerabilities • Modified web pages to cause clients to download and execute the worm payload

Hybrid Worms • Detection difficulties • Propagation pattern is difficult to predict since humans are involved • If one method is blocked it might find another way in…

Target Acquisition • Random Scanning • Subnet Scanning • Routing Worm • Pre-generated Hit List • Topological • Stealth / Passive

Random Scanning • 32 bit number is randomly generated and used as the IP address • Slammer and Code Red I • Hits black IP space frequently • Only 28.6% of IP space is allocated

Subnet Scanning • Generate last 1, 2, or 3 bytes of IP address randomly • Code Red II and Blaster • Some scans must be completely random to infect whole internet

Routing Worm • BGP information can tell which IP address blocks are allocated • This information is publicly available • http://www.routeviews.org/ • http://www.ripe.net/ris/

BGP Routing Worm • By including routable prefixes in the worm payload, it can limit its scanning to allocated addresses • Could reduce scanning space by 71.4% • Aggregation and compression could reduce the space needed to 175 KB • Compare • Slammer: 376 bytes • Blaster: 6 KB • Nimda: 57 KB

Class A Routing Worm • By examining BGP data you can see which Class A addresses are allocated • Only 116 of 256 Class A addresses are publicly routable (45.3% of total IP space) • Only 116 extra bytes are needed to reduce the scanning space in half

Pre-generated Hit List • Hit list of vulnerable machines is sent with payload • Determined before worm launch by scanning • Gives the worm a boost in the slow start phase • Skips the phase that follows the exponential model • Infection rate looks linear in the rapid propagation phase • Can avoid detection by the early detection systems

Topological • Uses info on the infected host to find the next target • Morris Worm used Network Yellow Pages and /etc/hosts file to find more hosts • Email worms use address books • P2P systems usually store info about hosts it connects to

Stealth / Passive • Waits for a vulnerable system to contact it • Hides the infection among normal traffic • No active scanning • Nimda - modification of server web pages • P2P systems - infected host could respond to requests with the worm

Polymorphic Worms • Worms can easily be enhanced for self-modification • Simple encryption with random key would randomize the payload • Small decryption routine would remain • This could be obfuscated and randomized as well • Random do-nothing instructions • Random padding • Exploit might remain common • Nimda email - no exploit data • Buffer Overflow - return address might be same

Detection / Prevention • Ideal: Dynamic Quarantine and Automatic Signature Generation • IPv6 vs. Worms • EarlyBird • Honeycomb • BGP Information • Kalman Filter • Hidden Markov Models • Email Worm Detection

Ideal • Detect worm outbreak quickly • Automatically generate signatures and filter packets immediately • Distribute alerts and signatures faster than worms can spread • Is this possible?

IPv6 vs. Worms • IPv6 has 2128 IP addresses • Smallest subnet has 264 addresses • 4 billion IPv4 internets • Consider a sub-network • 1,000,000 vulnerable hosts • 100,000 scans per second (Slammer - 4,000) • 1,000 initially infected hosts • It would take 40 years to infect 50% of vulnerable population with random scanning • Scan-based worms will be ineffective

EarlyBird • “Flows” are identified by packet content (or hash of content) • Counters of distinct sources and destinations are kept for popular flows • When counts cross the threshold, flow is considered a worm, and content used for signature • Additional “guilt” can be assigned to flows sent to black address space

EarlyBird • Benefits • Counts distinct sources and destinations • Most systems simply examine total traffic on a particular port and look for changes in the traffic pattern

EarlyBird • Packet content examination can be evaded with simple polymorphism • They suggest using sampled Rabin fingerprinting to find commonly occurring fixed length strings • If only 4 bytes are in common for a polymorphic worm, then the packets will be identified by only 4 bytes…. How to differentiate packets?

Honeycomb • Plugin to honeyd • Assumption: All traffic to a honeypot is suspicious • For every inbound packet - use longest common substring (LCS) algorithm to find a signature (after performing header analysis) • Adds signature to the signature pool • Periodically outputs signature pool to Snort/Bro • Problems: Traffic to regular hosts? Polymorphism?

BGP Information • Use black address space to watch for scans • Only will be useful in detecting random scanning worms • Use AS profiling to build a model of how much traffic comes from each AS and watch for drastic changes • Will it detect in time?

Kalman Filter • Worm propagation follows the epidemic model

Kalman Filter • Best system currently by Don Towsley, et al. • Distribute sensors (ingress and egress filters) around network to measure • Scan rate • Scan distribution • Total number of scans • Total number of infected hosts • Info sent to centralized Malware Warning Center (MWC)

Worm traffic Kalman Filter Monitored illegitimate traffic rate Exponential rate a on-line estimation Non-worm traffic burst

Kalman Filter • MWC uses Kalman filter to calculate trend in the growth • If it matches the exponential model, it is considered a worm • Sensors measure the info by packets sent to black IP space • Sensors must monitor 220 IP addresses to get accurate information • Can be circumvented by a hit-list or topological worm

Hidden Markov Model • Not very useful in worm detection • HMMs are based on changes in states • Worm outbreaks effectively consist of two states - vulnerable and infected • To be of use the transition to infected would need to be detected, which is basically worm detection…

Email Worm Detection • Email Mining Toolkit (EMT) - Columbia • Cliques - users usually send email to particular sets of users • Assumption: If user sends to a set that is not a subset of a clique, something is wrong • Anomaly detection to find suspicious email to be examined in more detail • Problems: If user sends one broadcast email, clique is useless. False positives.

Conclusion • Ideal in fighting worms - detection and quarantine / signature generation • Most research focuses on early detection • It is not clear how to protect after detection • Is it enough to close the port? • Ban offending IP addresses temporarily? • Is it possible to automatically generate signatures for any worm?

Worms: Taxonomy and Detection