1 / 16

Detecting Adversaries Using Metafeatures

Detecting Adversaries Using Metafeatures. Chad Mills Program Manager Windows Live Safety Platform Microsoft. Example Messages. Content Filter. Assumption: Spam words continue to appear in spam messages Good words continue to appear in good messages. m illion dollars t ransfer

ivanbritt
Download Presentation

Detecting Adversaries Using Metafeatures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Detecting Adversaries Using Metafeatures Chad Mills Program Manager Windows Live Safety Platform Microsoft

  2. Example Messages

  3. Content Filter Assumption: • Spam words continue to appear in spam messages • Good words continue to appear in good messages million dollars transfer guardian (dollars, 0.2) (million, 0.1) (transfer, 0.1) (community, -0.01) (social, -0.01) (fellow, -0.01) (guardian, 0.03) (March, -0.08) 0.37 March community social fellow -0.11

  4. <style> … <br Bij board bar atteindre jYST GCS re sonrisa fuse Kiviuq padded /> <br Star Honolulu /> <br Ons apporter /> opens NRSU syringe /> <br Jerusalem comfort HTTPS 2604 confidence Miles /> <br 27 mails Qty backwards Meditations bans sedative ect salve <br insightful /> Korean relations header greeting Airllines Phantom CVS Rae 504 1009 perf<br graphiques /> undertaking paced Liquidation reduction /> … From: "Chelsea Clark" <easyMoneySurveys@pointracer.com> Subject: Get PaidFor yourOpinion

  5. Simplified Example Chaff

  6. Finding good words Free Nigeria Viagra + = Good Message Spammy Words Borderline Spam Message + late click commissioner late click commissioner = Unknown Words Inbox Good Words Borderline Spam + newsletter select month newsletter select month = Unknown Words Junk Folder Non-Good Words Borderline Spam

  7. Spam: better than legitimate mail?

  8. Application: Chaff Chaff Spam • [spam content] • newsletterpeersmonthselectthese • lateclickcommissionermedia • smoothlyoffclosesupport before • okaysponsorrockgoby ads • nonecasestextmembership Legitimate Mail MarchisallabouttheZune community. This month, you can help create a new featureforTheSocial, gettips from afellow Zuneuserandfind out the winners of theYour Zune Your Choice Awards.

  9. Messages with great scores

  10. Messages with great scores

  11. Example Metafeatures • Sum of weights (content filter score) • Average weight • Standard Deviation • Percent of words that are good • Percent of words that are spam • Number of features • Maximum feature weight • Number of strong spam words • Etc.

  12. Metafeatures Metafeatures Features Sum: 0.37 σ: 0.09 Max: 0.2 million dollars transfer guardian (dollars, 0.2) (million, 0.1) (transfer, 0.1) (community, -0.01) (social, -0.01) (fellow, -0.01) (guardian, 0.03) (March, -0.08) 1.9 Sum: -0.11 σ: 0.04 Max: -0.1 March community social fellow -1.7 (feature, weight) (Metafeature, weight) (Sum:0.37, 1.0) (σ: 0.09, 0.8) (Max: 0.2, 0.1) (Sum: -0.11, -0.8) (σ: 0.04, -0.6) (Max: -0.1, -0.3)

  13. Evaluation Data • Hotmail Feedback Loop • Messages classified by recipients • Training Set: 1,800,000 messages • Ending on 5/20/07 • Evaluation Set: 50,000 messages • Data from 5/21/07

  14. Evaluation Results 45% improvement in TP at low FP levels

  15. Qualitative Results • At a reasonable False Positive rate: • 98% of unique catches are chaff spam • Caught 99.5% of chaff spam missed by regular content filter • Similar types of False Positives as regular filter • Challenges Remaining • Primarily just helped on spam with chaff • Relies on base content filter to detect spam with obfuscated content (e.g. v1agra) or naïve spam without any chaff

  16. Conclusions • Spam messages with good word chaff have unnatural weight distributions • Metafeatures is able to identify and catch these messages • This resulted in a 45% improvement in TP • Gains were limited to spam with good word chaff

More Related