BOTNET JUDO Fighting Spam with Itself

BOTNET JUDOFighting Spam with Itself By: Pitsillidis, Levchenko, Kreibich, Kanich, Voelker, Paxson, Weaver, and Savage Presentation by: Heath Carroll

The Origins of Spam

Presentation Overview • Abstract - What was the intent of the paper? • Introduction - current problems faced and methods used to combat them • Background - Def: Botnet, Regular Expression, Template-based Spam • Approach - How the authors dealt with this problem

Abstract • Botnet Judo: Fighting Spam with Itself or ‘Botnet Host Quarantine: What’d we learn?’ • Examination of a controlled, isolated, Botnet host. • Quick generation of precise and accurate spam filters with ~ 0 false positives

Introduction : Botnets • Definition: Botnet - a collection of software agents, or robots, that run autonomously and automatically. The term is most commonly associated with malicious software, but it can also refer to a network of computers using distributed computing software. (en.wikipedia.org/wiki/Botnet) • Example: DDoS attack against Blue Security, May 2, 2006

Botnets (cont’d) • Common uses of botnets: • Denial-of-service attacks • Adware • Spyware • Email spam (template, image, etc) • Click fraud • Internet Access number replacement • Fast flux (DNS Url/IP address switching)

SPAM!! • Template Based Spam • Botnet uses a RE to produce massive amounts of highly varied spam • Harder to [content] filter initially due to varied message makeup • Requires defenders to collect ‘suspect’ spam in order to lobby an effective content-based filter • Harder to [sender] filter due to massive host lists • Requires defenders to rely on alternative methods to combat the botnet

SPAM!! • Preventative measures: • Anti-virus software • Passive OS fingerprinting • Network based approaches (nullrouting) • Spam filtering • Directed study • The last two are covered by this paper

Anti-spam!! • Basically 2 different approaches: • Content-based : • Filtering based on established heuristics and learning algorithms focused against specific message features • Can be highly effective (esp against targeted botnets) • Labor intensive to maintain since the basic technique can be countered by chaff and poisoning attacks • Hard to maintain low false positives from the filter • Blacklisting URLs can also be effective, but needs large up-to-date white-lists to avoid poisoning • Doesn’t do anything if spam doesn’t utilize URLs

Anti-Spam!! (cont’d) • Sender-based • Focuses on spam delivery system • Assumes sender of spam is likely to repeat sending spam, and not likely to send legitimate messages • Basically works by Blacklisting offending senders after the fact • Doesn’t work against newest spam • Botnets are an effective work-around since the controller distributes his spam over a large number of hosts

Anti-Spam!! (cont’d) • Template-based spam filtering: • Suspected Botnet generated spam is examined and deconstructed into a Regular Expression (RE) • Works very well against static botnets, but requires a lot of instances of suspected spam to deconstruct • Useless if controller changes the RE used by the bots

Regular Expressions

Regular Expressions (cont’d) • Review:

JUDO!! • Generates regular expression signatures to thwart spam • Operates by examining the output from quarantined botnet • Uses template inference algorithm to generate a set of signatures matching all previous messages

JUDO!! (cont’d) • Header Filtering • Anchor identification • Macro classification • Dictionary • Micro-anchor • Noise • Special Tokens • Signature Update Second Chance Pre-clustering

Judo - Second Chance Mechanism • Used to mitigate the effects of a small training buffer • If a message signature fails to match an existing signature • It is re-checked using only anchors • If matched, signature is updated

Judo - Pre-clustering • Used to mitigate the effects of overly large training buffers (potentially mixed RE’s) • Skeleton signatures used to sort incoming messages prior to running Judo on them • Similar to second chance mechanism, but with a larger allowable anchor size

Experimental Results • Requirements of a good spam filter: • Safe: does not classify legitimate mail as spam • Low false positive rate • Effective: correctly identifies the targeted class of spam • Low false negative rate

Experimental Results (cont’d) • Testing: 4 tiers • Signature safety • Signatures from 3 other tiers run against legitimate mail ‘corpora’ to access false positive rate • to prevent age bias, they tested the signatures only on the subject and body of the corpora

Experimental Results (cont’d) • Controlled single template inference • Generated 5000 instances of spam from a ‘Storm’ bot from templates gained through reverse engineering • 1000 for signature generation • 4000 for testing false negative rate • Done for each of 10,676 templates (53,380,000 messages) • Results: • Also, at k = 1000 false positive rate = 0% for all sigs

Experimental Results (cont’d) • Controlled multi-template inference • Spam used for testing generated during the Botlab project at the University of Washington • 4 bots used: 1 each from Mega-D, Pushido, Rustock, and Srizbi botnets • First million messages from each split into training and testing sets, then Judo run chronologically on each test message • True matches determined if a match generated from signature generated from previous test messages • Otherwise counted as false negative

Experimental Results (cont’d) • Results: • Only false positives from Rustock bot tests

Experimental Results (cont’d) • Real world deployment: • 2xXarvester + 2xMega-D + 4xRustock + 6xGheg = 14 bots • Messages generated: • Ran the test as in multi-template runs

Experimental Results (cont’d) • Results: • Worst Case: Rustock again only source of false positives: 1 in 12,500 messages. All others 0 total false positives in corpora

Experimental Results (cont’d) • Efficiency: Since the goal of the project was an accurate RE generator, efficiency wasn’t a priority • Initial RE generation using buffer size 50 with 6000 character length messages takes about 2 sec using an average desktop circa 2009 • Signature updates at ~ 50-100 ms

Response Time • Based on the message out rate of the bot(s) generating the spam • May be complicated by the existance of multiple bots or templates • Bots used in this experiment generated > 100 spam messages per minute. • Since acceptable results from k >= 500, should only take a few minutes to generate a working signature

Overview • ‘Judo’ is basically a learning spam filter • Content based • Requires training to produce effective signatures • Safe and Effective (both greater than 99.75%) • Controlled tests show exceptional results • Simulated real world tests show promise, but could be worked around by bots that can randomly generate new templates

Any Questions?

BOTNET JUDO Fighting Spam with Itself

BOTNET JUDO Fighting Spam with Itself

Presentation Transcript

Fighting Spam in an Exchange Environment

Database Techniques for fighting SPAM

Maximizing Communication for Spam Fighting

Fighting Spam

Fighting SPAM: Whitelisting Revisited

Honeypot, Botnet, Security Measurement, Email Spam

Botnet-generated Spam

Fighting spam: the thin grey line

Judo

Section 12: Fighting Spam, Viruses, and Hacks

Fighting Spam: Techniques on the Table

Spam and Botnet Reputation Randomized Control Trials and Policy

Fighting Spam

JUDO

Characterizing Botnet from Email Spam Records

Botnet and Spam Detection in High-Speed Networks

Fighting Spam

Fighting SPAM Spamassassin

Botnet Judo: Fighting Spam with Itself

Semalt: Fighting and Avoiding Spam Emails