Detecting Adversaries Using Metafeatures

Detecting Adversaries Using Metafeatures Chad Mills Program Manager Windows Live Safety Platform Microsoft

Example Messages

Content Filter Assumption: • Spam words continue to appear in spam messages • Good words continue to appear in good messages million dollars transfer guardian (dollars, 0.2) (million, 0.1) (transfer, 0.1) (community, -0.01) (social, -0.01) (fellow, -0.01) (guardian, 0.03) (March, -0.08) 0.37 March community social fellow -0.11

<style> … opens NRSU syringe /> Korean relations header greeting Airllines Phantom CVS Rae 504 1009 perf undertaking paced Liquidation reduction /> … From: "Chelsea Clark" <easyMoneySurveys@pointracer.com> Subject: Get PaidFor yourOpinion

Simplified Example Chaff

Finding good words Free Nigeria Viagra + = Good Message Spammy Words Borderline Spam Message + late click commissioner late click commissioner = Unknown Words Inbox Good Words Borderline Spam + newsletter select month newsletter select month = Unknown Words Junk Folder Non-Good Words Borderline Spam

Spam: better than legitimate mail?

Application: Chaff Chaff Spam • [spam content] • newsletterpeersmonthselectthese • lateclickcommissionermedia • smoothlyoffclosesupport before • okaysponsorrockgoby ads • nonecasestextmembership Legitimate Mail MarchisallabouttheZune community. This month, you can help create a new featureforTheSocial, gettips from afellow Zuneuserandfind out the winners of theYour Zune Your Choice Awards.

Messages with great scores

Example Metafeatures • Sum of weights (content filter score) • Average weight • Standard Deviation • Percent of words that are good • Percent of words that are spam • Number of features • Maximum feature weight • Number of strong spam words • Etc.

Metafeatures Metafeatures Features Sum: 0.37 σ: 0.09 Max: 0.2 million dollars transfer guardian (dollars, 0.2) (million, 0.1) (transfer, 0.1) (community, -0.01) (social, -0.01) (fellow, -0.01) (guardian, 0.03) (March, -0.08) 1.9 Sum: -0.11 σ: 0.04 Max: -0.1 March community social fellow -1.7 (feature, weight) (Metafeature, weight) (Sum:0.37, 1.0) (σ: 0.09, 0.8) (Max: 0.2, 0.1) (Sum: -0.11, -0.8) (σ: 0.04, -0.6) (Max: -0.1, -0.3)

Evaluation Data • Hotmail Feedback Loop • Messages classified by recipients • Training Set: 1,800,000 messages • Ending on 5/20/07 • Evaluation Set: 50,000 messages • Data from 5/21/07

Evaluation Results 45% improvement in TP at low FP levels

Qualitative Results • At a reasonable False Positive rate: • 98% of unique catches are chaff spam • Caught 99.5% of chaff spam missed by regular content filter • Similar types of False Positives as regular filter • Challenges Remaining • Primarily just helped on spam with chaff • Relies on base content filter to detect spam with obfuscated content (e.g. v1agra) or naïve spam without any chaff

Conclusions • Spam messages with good word chaff have unnatural weight distributions • Metafeatures is able to identify and catch these messages • This resulted in a 45% improvement in TP • Gains were limited to spam with good word chaff

Detecting Adversaries Using Metafeatures

Detecting Adversaries Using Metafeatures

Presentation Transcript

Detecting Adversaries Using Metafeatures

DETECTING TARGETED ATTACKS USING SHADOW HONEYPOTS

FROM Adversaries to Allies:

Conventional Defenses + Unconventional Adversaries ???

Detecting Prostate Cancer Using MRI Data

Lawyers: Advocates and Adversaries

Detecting Bubbles Using Option Prices

Detecting Prostate Cancer Using MRI Data

Turkey – Russia Adversaries or Partners?

Measuring Adversaries

Multiple Accidental Adversaries

Understanding the Adversaries

Detecting Attacks in Routers Using Sketches

Detecting Computer Intrusions Using Behavioral Biometrics

Humor – Your Adversaries

FIGHTING ADVERSARIES IN NETWORKS

Detecting Targeted Attacks Using Shadow Honeypots

Measuring Adversaries

Understanding the Adversaries

Detecting Phishing Using Machine Learning