1 / 43

Talk (II): E-mail Spam: The Problem, Solutions and Potential Industrial Standards

Talk (II): E-mail Spam: The Problem, Solutions and Potential Industrial Standards. Jenq-Haur Wang Academia Sinica Nov. 16-17, 2006. Outline. Introduction Existing Solutions Regulatory Solutions Technical Solutions Potential Industrial Standards. Introduction. What is spam?

mattox
Download Presentation

Talk (II): E-mail Spam: The Problem, Solutions and Potential Industrial Standards

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Talk (II): E-mail Spam: The Problem, Solutions and Potential Industrial Standards Jenq-Haur Wang Academia Sinica Nov. 16-17, 2006

  2. Outline • Introduction • Existing Solutions • Regulatory Solutions • Technical Solutions • Potential Industrial Standards E-mail Spam

  3. Introduction • What is spam? • E-mail, netnews, instant messaging (“spim”), “Google-spam”, guestbook spam, Weblog comments spam, VoIP (“spit”), … • Unsolicited messages flooded to uninterested receivers, usually sent in bulk • What is e-mail spam? • Junk e-mail • Unsolicited bulk e-mail (UBE) • Unsolicited commercial e-mail (UCE) E-mail Spam

  4. Spam Statistics • Jan. 2001, • 8% of all e-mail traffic in the US is spam [Brightmail Inc.] • Jan. 2003, • 42%[Brightmail Inc.] • Jul. 2004, • 65% [Symantec (Brightmail) Inc.] • In 2002, • 3 pieces/day/user (average) [Ferris Research] • By 2005, • 10 pieces/day/user (average) [Ferris Research] E-mail Spam

  5. Spam Statistics (cont.) E-mail Spam

  6. Spam Statistics (cont.) E-mail Spam

  7. Costs of Spam • Enterprises • > US$10 billion for US organizations in 2003 [Ferris Research] • US$245,000/year for a company with 14,000 employees [IDC] • End users • 5 spam/day, 30 seconds each -> 15 hours/year [Ferris Research] • Loss of productivity • Burden on ISPs • System resource consumption on servers • Waste on network bandwidth • User complaints E-mail Spam

  8. Latest Spam Statistics [source: Spam Statistics 2006, by Don Evett,TopTenReviews, Inc.] • Email considered spam: 40% • Daily Spam emails sent: 12.4 biliion • Daily spam received per person: 6 • Annual spam received per person: 2,200 • Spam cost to all non-corp. Internet users: $255 million • Spam cost to all US corporations in 2002: $8.9 billion • States with anti-spam laws: 26 E-mail Spam

  9. Latest Spam Statistics (cont.) • Email address changes due to spam: 16% • Estimated spam increase by 2007: 63% • Annual spam in 1,000 employee company: 2.1 million • Users who reply to spam email: 28% • Users who purchase from spam email: 8% • Corporate email that is considered spam: 15-20% • Wasted corporate time per spam email: 4-5 sec E-mail Spam

  10. Email Statistics • Daily emails sent: 31 billion • Daily emails sent per email address: 56 • Daily emails sent per person: 174 • Daily emails sent per corporate user: 34 • Daily emails received per person: 10 • Email addresses per person: 3.1 average • Cost to all Internet users: $255 million E-mail Spam

  11. Spam Categories • Products: 25% • Financial: 20% ↑ • Adult: 19% ↑ • Scams: 9% • Health: 7% • Internet: 7% • Leisure: 6% • Spiritual: 4% • Other: 3% (Source: http://www.brightmail.com/spamstats.html, Jun. 2004 & http://spam-filter-review.toptenreviews.com/spam-statistics.html, 2006 ) E-mail Spam

  12. Origins of Spam • Where does the spam come from? [Sophos, “Dirty Dozen” spam producing countries, Apr. 2005] • 35.7% (43%): from the US • 25.0% ↑(16%): from South Korea • 9.7% (11%): from China • … E-mail Spam

  13. Major Factors • Simple SMTP mail relaying mechanism • Cannot verify the identity of the sender • Forged IP address /sender e-mail address • Open mail relay/proxy • Low cost for sending bulk e-mails • Low cost for e-mail address harvesting • Web, mailing list, … • Bulk mailer programs • Low cost for obtaining “free” e-mail address E-mail Spam

  14. MX records DNS sender domain SMTP MUAs MTAs queues sender SMTP POP3/IMAP4 MUAr MTAr mailbox recipient receiver domain Lifecycle of E-mails E-mail Spam

  15. Existing Solutions • Regulatory solutions • Anti-spam laws • Limitations • Technical solutions • Filtering • Postage • Disposable e-mail address E-mail Spam

  16. Regulatory Solutions • Anti-spam laws • http://www.spamlaws.com/ • Ex: US federal law CAN-SPAM Act (S.877) enacted on Jan. 1, 2004 • Limitations • Dependence on evidences in technical information • Slow and costly process E-mail Spam

  17. Current Status ofAnti-Spam Laws • In the US: • Enacted federal laws: CAN-SPAM Act of 2003 (Pub. L. 108-187, S. 877) • Enacted state laws:  Arkansas, California, Colorado, Connecticut, Delaware, Idaho, Illinois, Indiana, Iowa, Kansas, Louisiana, Maryland, Minnesota, Missouri, Nevada, New Mexico, North Carolina, Ohio, Oklahoma, Pennsylvania, Rhode Island, South Dakota, Tennessee, Utah, Virginia, Washington, West Virginia, Wisconsin, Wyoming, … • In Europe: • European Union, Austria, Belgium, Czech Republic, Denmark, Finland, France, Germany, Greece, Ireland, Italy, Luxembourg, Netherlands, Norway, Portugal, Spain, Sweden, United Kingdom, … • In other countries: • Argentina, Australia, Brazil, Canada, India, Japan, Panama, Peru, Russia, South Korea, Yugoslavia, … • Taiwan: “Anti-Hacker” laws in the Martial Law (Jun. 3, 2003) E-mail Spam

  18. Technical Solutions • Filtering: toseparate bad from good • Heuristic-based • Classification-based: machine learning • Others: peer-to-peer, honeypot • Postage: to increase the cost of sending e-mails • Hiding email address • Encoding (text to image, Java script, …) • Disposable email address: separate e-mail address for different correspondence • Enhancing SMTP mechanism • Email path verification • Authenticated SMTP E-mail Spam

  19. Filtering TechniqueHeuristic-based • Black/White/Grey lists • Blacklist: lists of IP addresses that send spam • RBLs (Real-time Blackhole Lists), open mail relays, open proxies, … • Whitelist: lists of trusted sender • Challenge-response mechanism • Greylisting: temporary delay of e-mail from unknown sender • Problems • Easy to make mistake • Forged IP address/sender e-mail address • Lists need to be updated frequently • Changing spammer e-mail addresses E-mail Spam

  20. Filtering TechniqueHeuristic-based (cont.) • Keyword-matching rules (ex. MS Outlook) • Look for similar messages based on their subject or content • Problems • Exact rules are difficult to formulate and maintain • Spam is always changing • Chinese menu (madlibs) attack Make thousands of dollars working at home !!! Earn lots of money in the comfort of your own house. E-mail Spam

  21. Filtering TechniqueClassification-based • Machine learning • Text classification methods: TF-IDF, Naïve Bayes, SVM (Support Vector Machine), … • Learn spam vs. good • Adapt to changing spam • Problems • Need lots of training data • Diverse contents in e-mail spam • Spammers are learning too • Images, synonyms, misspellings, … • “One man’s spam is another man’s ham” E-mail Spam

  22. Filtering Techniques -- Others • Distributed (peer-to-peer, collaborative) spam filtering • To share the knowledge of spam features • SpamNet: Cloudmark • SpamWatch: UC Berkeley • Problems • Efficacy • Efficiency E-mail Spam

  23. report check Add-in Add-in MUAr recipient recipient Distributed Spam Filtering • Cloudmark’s SpamNet SpamNet MUAr POP3/IMAP4 MTAr Client-side Client-side E-mail Spam

  24. Discussions on Filtering-based Approach • False-positive vs. false-negative • Cost-sensitive e-mail classification • Incoming vs. outgoing e-mail filtering • Ex. corporate mail filtering might focus on preventing confidential data E-mail Spam

  25. Postage • Postage: to increase the cost of sending e-mails • Money: payment • Computation: time • Turing tests: challenge-response • Problems • Requires multiple monetary transactions for each e-mail delivery • Who pays for infrastructure? E-mail Spam

  26. Disposable E-mail Address • Disposable e-mail address • Separate e-mail address for each correspondence • Channelized e-mail system [R. Hall] • Sort incoming mails according to sender address • Terminate the address with spam • Problems • How do new senders get your address? • What’s the sender address for multiple receivers? • Difficult to remember E-mail Spam

  27. Enhancing SMTP Mechanism • Email path verification • To trace the real origin of e-mail (sender) • Problem: accounting is needed for packet network • Authenticated SMTP • Trusted environment • SMTP authentication (RFC 2554), SMTP over SSL/TLS (RFC 3207), digital signatures (PGP, …) • Problem: need client-server cooperation E-mail Spam

  28. Other Techniques (cont.) • Reputation-based approach • Based on HITS (Hyperlink Induced Topic Search) algorithm • Ranking on email sending/receiving reputation • Problem • Bad reputation for volume senders (mailing lists, newsletters, …) E-mail Spam

  29. Existing Anti-Spam Tools • Open Source Filters • SpamAssassin • ifile • bogofilter • POPfile • SpamBayes • CRM114 • Commercial Products • BrightMail • SurfControl • Anti-virus E-mail Spam

  30. Spammers’ Tricks • Images: MIME • Invisible ink (hidden text): color • Misspelling • o -> 0 • i -> l -> 1 -> ! • S -> 5 • F R E E, g-i=r-l, … • Ref: John Graham-Cumming: The Spammers’ Compendium, http://www.jgc.org/tsc/index.htm E-mail Spam

  31. Potential Industrial Standards • Sender/Domain authentication for e-mails • Sender ID Framework (Microsoft) • DKIM (Yahoo, Cisco) • DomainKeys (Yahoo) • Identified Internet Mail (Cisco) • SPF • Sender Permitted From (AOL) E-mail Spam

  32. Structures of E-mails • Envelope: SMTP (RFC 2821) • Header & body: RFC 2822 E-mail Spam

  33. Sender ID Framework (MS) E-mail Spam

  34. DomainKeys E-mail Spam

  35. IIM –Authentication /Authorization Model Messages must pass two tests before they are authenticated AUTHORIZE THE SENDER AUTHENTICATE THE MESSAGE + Receiving domain authenticates the message—i.e. Verifies that the message was not altered in any consequential manner prior to reaching the receiving domain Receiving domain asks sending domain to confirm that whoever signed the message was authorized to do so (without having to identify the sender) E-mail Spam 10401_10_2004

  36. Identified Internet Mail E-mail Spam

  37. DomainKeys Identified Mail(DKIM) • Derived from Yahoo DomainKeys and Cisco Identified Mail • IETF Working Group formed • IETF Internet draft • Message header authentication • DNS identifiers • Public keys in DNS • End-to-end • Between origin/receiver administrative domains • Not path-based E-mail Spam

  38. SPF • Sender Policy Framework • Derived from Sender Permitted From (SPF, AOL) • By Meng Wong, CTO of Pobox • Current specification: SPFv1 (RFC 4408) • Reverse MX records • Adopted by many mail server implementations E-mail Spam

  39. Tips for End Users (1/2) • Never give out your personal e-mail address to strangers • Use separate e-mail addresses for business and public use (“disposable”) • Never respond to unsolicited e-mail • Do not click on links within unsolicited e-mail, including deceptive unsubscribe links E-mail Spam

  40. Tips for End Users (2/2) • Read carefully the subject line on all e-mail, and use the preview feature on mail programs • If your e-mail address appears on a Web site, ask the site's manager to do some encoding • Use e-mail service providers that filter spam • Install an anti-spam program on your computer E-mail Spam

  41. Conclusion • Anti-spam is a battle • “Every time we discover a feature to catch spam, spammers will find a work-around” • Some advices • Filtering is just one part of the solutions • Try to make the costs of spammers higher • Be nice to your e-mail address • Mail delivery has to be improved E-mail Spam

  42. References • IRTF ASRG: http://asrg.sp.am/ • Sender ID: http://www.microsoft.com/mscorp/safety/technologies/senderid/technology.mspx • DKIM: http://dkim.org/ • DomainKeys: http://antispam.yahoo.com/domainkeys • Identified Internet Mail: http://www.identifiedmail.com/ • SPF Project: http://www.openspf.org/ • RFCs and Internet Drafts E-mail Spam

  43. References for Research • MIT Spam Conference (2003-2006) • http://www.spamconference.org/ • Conference on Email and Anti-Spam (CEAS) (2004-2006) • http://www.ceas.cc/ • TREC (Text REtrieval Conference) Spam Track (2005-2006) • http://trec.nist.gov/data/spam.html E-mail Spam

More Related