90 likes | 413 Views
Anti-Spam Strategies Joshua Alspector AOL Seeds of Spam What is spam? Unsolicited bulk e-mail? Anything you didn’t ask for? - kill direct marketing Personal definition? - affects policies, filtering Libertarian roots of Internet Free speech by anyone to anyone
E N D
Anti-Spam Strategies Joshua Alspector AOL FDIS, August 1, 2003
Seeds of Spam • What is spam? • Unsolicited bulk e-mail? • Anything you didn’t ask for? - kill direct marketing • Personal definition? - affects policies, filtering • Libertarian roots of Internet • Free speech by anyone to anyone • Trustful protocols like SMTP • Anonymous: no checking IDs • Scale of e-mail • Cheaper and easier than postal mail • As easy to send 1 message as 1 million • Costs borne by recipient not sender FDIS, August 1, 2003
Ecology of Spam • Low cost to sender • Spammers make money with only 1 in 100,000 response • No incremental costs for bulk mail • Commission system for spammers • High cost to business • $10B/yr in productivity, processing, anti-spam tools • Reduces usefulness of email • Other messages (IM, chat) affected as well • Spam-a-lot • Can register for free accounts automatically • Can hijack relays, proxies • Can obscure IP addresses • Can script mail easily FDIS, August 1, 2003
Blocking Spam • Blacklists • Mail, IP addresses from complaints • Operations likes this, keeps system costs down • Collateral damage, direct marketers hate this • Whitelists • Buddies, address book, ‘people I know’ , auto-populate • Special marketing arrangements – a problem • Filters • Keywords, adaptive, high-volume signatures • Weapon of choice but must avoid collateral damage • Challenge-Response • First time mailer must fill in human-readable form • Rude, problem with receipts, alerts, listservs FDIS, August 1, 2003
Text Spam Filters • Bayesian filters • Popular, See "Better Bayesian Filtering“ http://paulgraham.com/better.html (Jan, 2003). • Easy to store word counts and calculate probabilities • Adaptive, content-based technique • Content is what spammers can’t hide • Adapt as fast as spammers • Algorithms considered • Naive Bayes • Support Vector Machine • Perceptron FDIS, August 1, 2003
Arms Race • Adaptive filters learn what you consider spam • Spammers adjust (e.g. v’i’a’g’r’a, graphical, html tables) • Driven to deceptive subject lines, images, hijacked accounts • Check drop boxes to see what gets through • More sophisticated clients • Picture signatures, unicode, vector graphics • Must learn to see in ‘eye space’ • Volume filters • Append random text to fool signature techniques • Chop up mailings in small chunks • Hijack open proxies, multiple ISPs • Scripted automatic free mail registrations • ISPs recently implemented Turing-type challenge FDIS, August 1, 2003
Direct Marketing Problems • ISPs allow bulk mail from clients • Significant complaints from this ‘whitelisted’ mail • Spam looks almost identical to adaptive filters • Direct Marketers’ Position • Would like to avoid spam blocks • Honest subject and headers • Opt-out mechanism • Seal of integrity or consent token • Legal Approach • Laws against deceptive advertising • People love idea of ‘do not spam’ list • 90% of spam is untraceable to original sender • Much comes from Korea, China, Pakistan, Colombia, Russia, Japan FDIS, August 1, 2003
Other Strategies • Economic • Transfer cost to sender • E.g. First 100/day free, then $.001 next 1000, $.01 next 100,000 • E.g. Make senders post bond which receivers can collect • E.g. Make senders perform a compute intensive task (encrypt?) • Much spam comes from unsuspecting victims of hijacked accounts • Authentication • Strip email of anonymity, trace like phone calls (SS7?) • Authenticate with encrypted tokens • 3rd party anonymizer • Unique digital stamp for each email • Reputation mechanism or trust seal • Need to re-engineer e-mail and SMTP FDIS, August 1, 2003
Is Tide Turning? • From Spam-a-lot • To Spam-a-geddon? • Opinions? Ideas? FDIS, August 1, 2003