290 likes | 435 Views
BOTNET JUDO Fighting Spam with Itself. By: Pitsillidis, Levchenko, Kreibich, Kanich, Voelker, Paxson, Weaver, and Savage Presentation by: Heath Carroll. The Origins of Spam. Presentation Overview. Abstract - What was the intent of the paper?
E N D
BOTNET JUDOFighting Spam with Itself By: Pitsillidis, Levchenko, Kreibich, Kanich, Voelker, Paxson, Weaver, and Savage Presentation by: Heath Carroll
Presentation Overview • Abstract - What was the intent of the paper? • Introduction - current problems faced and methods used to combat them • Background - Def: Botnet, Regular Expression, Template-based Spam • Approach - How the authors dealt with this problem
Abstract • Botnet Judo: Fighting Spam with Itself or ‘Botnet Host Quarantine: What’d we learn?’ • Examination of a controlled, isolated, Botnet host. • Quick generation of precise and accurate spam filters with ~ 0 false positives
Introduction : Botnets • Definition: Botnet - a collection of software agents, or robots, that run autonomously and automatically. The term is most commonly associated with malicious software, but it can also refer to a network of computers using distributed computing software. (en.wikipedia.org/wiki/Botnet) • Example: DDoS attack against Blue Security, May 2, 2006
Botnets (cont’d) • Common uses of botnets: • Denial-of-service attacks • Adware • Spyware • Email spam (template, image, etc) • Click fraud • Internet Access number replacement • Fast flux (DNS Url/IP address switching)
SPAM!! • Template Based Spam • Botnet uses a RE to produce massive amounts of highly varied spam • Harder to [content] filter initially due to varied message makeup • Requires defenders to collect ‘suspect’ spam in order to lobby an effective content-based filter • Harder to [sender] filter due to massive host lists • Requires defenders to rely on alternative methods to combat the botnet
SPAM!! • Preventative measures: • Anti-virus software • Passive OS fingerprinting • Network based approaches (nullrouting) • Spam filtering • Directed study • The last two are covered by this paper
Anti-spam!! • Basically 2 different approaches: • Content-based : • Filtering based on established heuristics and learning algorithms focused against specific message features • Can be highly effective (esp against targeted botnets) • Labor intensive to maintain since the basic technique can be countered by chaff and poisoning attacks • Hard to maintain low false positives from the filter • Blacklisting URLs can also be effective, but needs large up-to-date white-lists to avoid poisoning • Doesn’t do anything if spam doesn’t utilize URLs
Anti-Spam!! (cont’d) • Sender-based • Focuses on spam delivery system • Assumes sender of spam is likely to repeat sending spam, and not likely to send legitimate messages • Basically works by Blacklisting offending senders after the fact • Doesn’t work against newest spam • Botnets are an effective work-around since the controller distributes his spam over a large number of hosts
Anti-Spam!! (cont’d) • Template-based spam filtering: • Suspected Botnet generated spam is examined and deconstructed into a Regular Expression (RE) • Works very well against static botnets, but requires a lot of instances of suspected spam to deconstruct • Useless if controller changes the RE used by the bots
Regular Expressions (cont’d) • Review:
JUDO!! • Generates regular expression signatures to thwart spam • Operates by examining the output from quarantined botnet • Uses template inference algorithm to generate a set of signatures matching all previous messages
JUDO!! (cont’d) • Header Filtering • Anchor identification • Macro classification • Dictionary • Micro-anchor • Noise • Special Tokens • Signature Update Second Chance Pre-clustering
Judo - Second Chance Mechanism • Used to mitigate the effects of a small training buffer • If a message signature fails to match an existing signature • It is re-checked using only anchors • If matched, signature is updated
Judo - Pre-clustering • Used to mitigate the effects of overly large training buffers (potentially mixed RE’s) • Skeleton signatures used to sort incoming messages prior to running Judo on them • Similar to second chance mechanism, but with a larger allowable anchor size
Experimental Results • Requirements of a good spam filter: • Safe: does not classify legitimate mail as spam • Low false positive rate • Effective: correctly identifies the targeted class of spam • Low false negative rate
Experimental Results (cont’d) • Testing: 4 tiers • Signature safety • Signatures from 3 other tiers run against legitimate mail ‘corpora’ to access false positive rate • to prevent age bias, they tested the signatures only on the subject and body of the corpora
Experimental Results (cont’d) • Controlled single template inference • Generated 5000 instances of spam from a ‘Storm’ bot from templates gained through reverse engineering • 1000 for signature generation • 4000 for testing false negative rate • Done for each of 10,676 templates (53,380,000 messages) • Results: • Also, at k = 1000 false positive rate = 0% for all sigs
Experimental Results (cont’d) • Controlled multi-template inference • Spam used for testing generated during the Botlab project at the University of Washington • 4 bots used: 1 each from Mega-D, Pushido, Rustock, and Srizbi botnets • First million messages from each split into training and testing sets, then Judo run chronologically on each test message • True matches determined if a match generated from signature generated from previous test messages • Otherwise counted as false negative
Experimental Results (cont’d) • Results: • Only false positives from Rustock bot tests
Experimental Results (cont’d) • Real world deployment: • 2xXarvester + 2xMega-D + 4xRustock + 6xGheg = 14 bots • Messages generated: • Ran the test as in multi-template runs
Experimental Results (cont’d) • Results: • Worst Case: Rustock again only source of false positives: 1 in 12,500 messages. All others 0 total false positives in corpora
Experimental Results (cont’d) • Efficiency: Since the goal of the project was an accurate RE generator, efficiency wasn’t a priority • Initial RE generation using buffer size 50 with 6000 character length messages takes about 2 sec using an average desktop circa 2009 • Signature updates at ~ 50-100 ms
Response Time • Based on the message out rate of the bot(s) generating the spam • May be complicated by the existance of multiple bots or templates • Bots used in this experiment generated > 100 spam messages per minute. • Since acceptable results from k >= 500, should only take a few minutes to generate a working signature
Overview • ‘Judo’ is basically a learning spam filter • Content based • Requires training to produce effective signatures • Safe and Effective (both greater than 99.75%) • Controlled tests show exceptional results • Simulated real world tests show promise, but could be worked around by bots that can randomly generate new templates