280 likes | 431 Views
Gianluca Stringhini, Manuel Egele, Apostolis Zarras, Thorsten Holz, Christopher Kruegel, and Giovanni Vigna. B@bel:Leveraging Email Delivery for Spam Mitigation. University of California, Santa Barbara Ruhr-University Bochum. Usenix Security 2012.
E N D
Gianluca Stringhini, Manuel Egele, Apostolis Zarras, Thorsten Holz, Christopher Kruegel, and Giovanni Vigna B@bel:Leveraging Email Delivery for Spam Mitigation University of California, Santa Barbara Ruhr-University Bochum Usenix Security 2012 李佳恆 leegoder@gmail.com
Outline • Introducion • Background • Approach • Evaluation • Conclusion
Introducion KASPERSKY LAB. Spam Report: April 2012. Email spam Accounting for more than 77% of all email traffic https://www.securelist.com/en/analysis/204792230/Spam_Report_April_2012
SYMANTEC CORP. State of spam & phishing report About 85% of world-wide spam traffic is sent by botnets http://www.symantec.com/business/theme.jsp?themeid=state_of_spam
Traditional spam dection systems 1.Content analysis 2.Origin base Ex.Blacklists ============new way========== Focus on the email deliivery mechanism (How messages are sent by spammers)
SMTP From wiki eg: Hotmail (mail user agent ) eg: Outlook (mail transfer agent ) eg: msa.hinet
SMTP Reply:220 msr5.hinet.net ESMTP Sendmail 8.14.2/8.14.2; Sun, 29 Jul 2012 17:38:35 +0800 (CST) EHLO adl.com Reply:250-msr5.hinet.net Hello 114-34-35-96.HINET-IP.hinet.net [114.34.35.96], pleased to meet you MAIL FrOm:<dada@msa.hinet.net> Reply:250 2.1.0 <dada@msa.hinet.net>... Sender ok rCpt tO: <leegoder@gmail.com> Reply:250 2.1.5 <leegoder@gmail.com>... Recipient ok Data Reply:354 Enter mail, end with "." on a line by itself SubJECT : HI i am dada YOYOYO test !!!`~~~~ ... . Reply:250 2.0.0 q6T9cZtc012399 Message accepted for delivery
SMTP RFC 821 • SMTP RFC defines 14 commands. • Each command consists of four case-insensitive,alphabetic-character command codes • One or more space characters separate command codes • All command are terminated by line terminator(<CR><LF>) Smtp replies :three-digit status code+space+description (one line ,e.g., 250 OK)
SMTP Dialects • Different clients might implement the SMTP protocol in slightly different ways. 1.RFCs Do not always provide a single Format (e.g.,EHLO vs HELO) 2.Using different extension,client might add different parameters 3.Server accept commands that do not comply with the strict SMTP definitions
Learning Dialects Passively observe ( ) • A set of SMTP conversations • Each conversation is a sequence of <reply,command> pairs E.g.,<220 hinet.net, EHLO adl.com> Active probing • Send specifically-crafted replies to a client • And observe its responses
Active probing • Standard SMTP replies (e.g., send error) • Addiional SMTP replies (e.g., send twice) • Out-of-order Smtp replies • Missing replies (nerver sends a reply to a command) • Compliant replies (e.g., hOsT) • Incorrect replies (e.g., 9999) • incorrectly-terminated replis (e.g.,<CR><CR>)
Regular expressions MAIL FROM:<dada@msa.hinet.net> MAIL FROM:gaga@msa.hinet.net MAIL FROM:<email-addr> Mail From :gaga@msa.hinet.net Mail From :<email-addr> E.g.,<220 hinet.net, EHLO adl.com> <220 hostname,EHLO domain> wiki
State machine <Reply ,Command> <transaction, state> E.g.,<220 hostname,EHLO domain> spam Gmail
Decision state Machine • Wolf WOLF, W. An Algorithm for Nearly-Minimal Collapsing of Finite-State Machine Networks. (ICCAD) (1990).
Making a descison < Reply,Command> E.g.,<220 hostname,EHLO domain> ... C3 unknow unknow unknow E.g.,<220 hostname,HELO domain> <250 OK,MAIL FROM:<email-addr>> ... C3 unknow E.g.,<220 hostname,HELO domain> <250 OK,RSET> ... C2
The Botnet Feedback Mechanism • Some spammers take server feedback into account e.g., recopient address does not exist Cutwail : 35% email address were not exist [38] • Providing False Responses to Spam Emails. [38]http://www.iseclab.org/papers/cutwail-LEET11.pdf
Enviroment • B@bel • 1.Virtual machine zoo • 2.gateway • 3.learner => decision fsm => • 4.decision maker
Evaluating dialects for Classification • Run Babel • Training set (13 legitimate , 91malware) • Legitimate MUAs and MTAs are distinct from Bots • Legitimate MUAs and MTAs are all speak distinct dialects • (except for Outlook Express and Windows Live Mail) • 91malware: 48 dialects • Same dialects belong to the same family
Evaluating Dialects for Spam Detection • Run Babel • SMTP converastions for 621919 email messages(40days) • 7114 bot samples[4] >> bad dialects • MUA+MTA+webmail >> good dialects • Passive spam detection • Decision machine do not recognize the conversaction >> mark as spam
Evaluating Dialects for Spam Detection • 621919 email (ALL) • 260074 spam , 218675 ham ,143170 ?? Verify true positive • IP blacklist (30) + resolve domain 99.32% true positive False negative 21% False negative (misused web mail account,dedicated MTA) (half is legitimate MTAs)
Limitations and Evasion • Evading dialects detection: Use an existing open source smtp engine (CDO) • But spambots are built for performance Bagle(a spam bot) : 20ms/aletter CDO(windows) : 200ms/aletter collaboration data objects library
Conclusion • Introduced a novel way to detect and mitigate spam emails • We study how the feedback mechanism used by botnets can be poisoned • Empirical result confirm that our approach can be used to detect and mitigate spam emails.