1 / 32

Automated Worm Fingerprinting

Explore automated worm detection methods using context analysis, content sifting, and address dispersion to identify unknown worms efficiently in real-time network traffic. Learn about signature generation, content prevalence, and scalable deployment strategies.

pmeredith
Download Presentation

Automated Worm Fingerprinting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese, and Stefan Savage Manan Sanghi

  2. The menace

  3. Context • Worm Detection • Scan detection • Honeypots • Host based behavioral detection • Payload-based ???

  4. Context • Characterization • A priori vulnerability signatures • Generally manual • Honeycomb • Host based • Longest common subsequences • Autograph • Network level automatic signature generation

  5. Context Internet Quarantine • Containment • Host quarantine • String matching • Connection throttling Address Blacklisting Content Filtering

  6. Worm behavior • Content Invariance • Limited polymorphism e.g. encryption • key portions are invariant e.g. decryption routine • Content Prevalence • invariant portion appear frequently • Address Dispersion • # of infected distinct hosts grow overtime • reflecting different source and dest. addresses

  7. Key Idea • Detect unknown worms on the basis of • A common exploit sequence • Rage of unique sources and destination

  8. Content Sifting • For each string w, maintain • prevalence(w): Number of times it is found in the network traffic • sources(w): Number of unique sources corresponding to it • destinations(w): Number of unique destinations corresponding to it • If thresholds exceeded, then block(w)

  9. Issues • How to compute prevalence(w), sources(w) and destinations(w) efficiently? • Scalable • Low memory and CPU requirements • Real time deployment over a Gigabit scale link

  10. prevalence(w) • w – entire packet • Use multi-stage filters (k-ary sketches?) • w – small fixed length b • Rabin fingerprints • Value sampling

  11. Value Sampling • The problem: s-b+1 substrings • Solution: Sample • But: Random sampling is not good enough • Trick: Sample only those substrings for which the fingerprint matches a certain pattern • Since Rabin fingerprints are randomly ditributed, Prtrack(x)=1-e-f(x-b+1)

  12. sources(w) & destinations(w) • Address Dispersion • Counting distinct elements vs. repeating elements • Simple list or hash table is too expensive • Key Idea: Bitmaps • Trick : Scaled Bitmaps

  13. Direct Bitmap • Each content source is hashed into a bitmap, the corresponding bit is set, and an alarm is raised when the number of bits set exceeds a threshold • Drawback: lose estimation of actual values of each counter

  14. Scaled Bitmap • Idea: Subsample the range of hash space • How it works? • multiple bitmaps each mapped to progressively smaller and smaller portions of the hash space. • bitmap recycled if necessary. Result Roughly 5 time less memory + actual estimation of address dispersion

  15. Putting it together

  16. Experience • System design: Sensors and Aggregators • sensor sift through traffic on configurable address space zones of responsibility • aggregator coordinates real-time updates from the sensors, coalesces related signatures and so on. • Parameters: • content prevalence: 3 • address dispersion threshold:30 • garbage collection time: several hours

  17. prevalence(w) threshold

  18. Address Dispersion threshold

  19. Garbage Collection threshold

  20. Trace-based False Positives

  21. Performance • Processing time: • Memory Consumption: 4M bytes

  22. Live Experience • Detect known worms: CodeRed, • Detect new worms: MyDoom, Sasser, Kibvu.B

  23. Limitation & Extension • Variant content • Network evasion • Extension: Dealing with slow worms

  24. Comparison Qinghua Zhang

  25. Breather

  26. Polygraph: Automatically Generating Signatures For Polymorphic Worms James Newsome, Brad Karp, Dawn Song

  27. The case for polymorphic worms • Single Substring Insufficient • Sensitive: Should exist in all payload of a worm • Specific: Should be long enough to not exist in any non-worm payload

  28. Examples

  29. Signature Classes • Signature – set of tokens • Conjunction Signatures • Token-subsequence Signatures • Bayes Signatures

  30. Problem Formulation

  31. Algorithms • Preprocessing • Distinct substrings of a minimum length l that occur in at least k samples in suspicious pool • Generating signatures • Conjunction signatures • Token Subsequence Signatures • Bayes Signatures

  32. Wrap Up • Automated Worm Fingerprinting (OSDI 2004) • Polygraph: Automatically Generating Signatures For Polymorphic Worms (IEEE Security Symposium 2005) Manan Sanghi

More Related