390 likes | 514 Views
Fighting the Spam Email: a Long and Tough War. Some facts about spams Techniques to fight against the junk emails. Spam: also called unsolicited bulk e-mail (UBE) or unsolicited commercial e-mail. Has increased steadily or even exponentially since early nineties
E N D
Some facts about spams • Techniques to fight against the junk emails
Spam: also called unsolicited bulk e-mail (UBE) or unsolicited commercial e-mail. • Has increased steadily or even exponentially since early nineties • As of 2012, over 145 billion emails/day • About 65% of all emails are spam (94 billion. Most of them captured by filters.) • About 80% of spam are sent by zombie now
About 80% of the spam are sent by fewer than 200 spammers • In the year of 2007, spam cost the businesses on the order of $100 billion / year • Most of the spam contain some URL: the top five countries host 99.68% of the global spammer websites • Pills, poker, and porn are the majority of the spam
Number of spam • 2005 - (June) 30 billion per day • 2006 - (June) 55 billion per day • 2007 - (February) 70 billion per day • 2012: 90 – 120 billion per day
Different organizations estimate that 85% to 97% of emails this day are unsolicited • Bill Gates received 4 million emails/year, most of them are spam
Legal issues in US • Spam is legally permissible according to the CAN-SPAM Act of 2003 if: • a truthful subject line • no false information in the technical headers or sender address • other minor requirements • In 2004 less than 1% of spam complied with the CAN-SPAM Act of 2003
How Spam Emails have evolved in the past years • Spam presents a problem: it changes • Spammers are adversarial • They react (very fast: hours to days) to changes in spam filtering technology • Spammers innovate constantly • The products they sell • How they send the spam (fixed IPs, broken web forms, open proxies, zombie networks, …) • The content of their messages
Once upon a time… • It was fairly easy to filter spam • It came from fixed IP addresses: use a blacklist • The From address was not forged: easy to filter • The spam contained keywords that could be blacklisted • penis, viagra, etc. • Simplistic filtering on From and content is now useless (has been for a few years) • Only complex algorithms can filter on spam content • Hashing, weighted heuristics, ‘Bayes’, …
In 2003 • Because spammers were trying to avoid filters by all sorts of trickery • Some tricks you are familiar with… • VIAGRA has become V1@GRA • PENIS has become PE.NIS • ENLARGER is written ENARELGR • There are 10^21 ways to spell viagra • Most others are hidden from the reader
The Spammers’ Compendium • Hosted at Virus Bulletin now • Informal collection of spammer tricks • All about content of spam, not how it’s sent • Been collecting since 2003 • Now a total of 65 tricks • All have been seen in the wild
Fun facts • 80% of spam uses HTML • Colors, images, fonts… • … and tricks • 80% of spam uses at least one content trick • Invisible Ink, Camouflage, … • Spam and spam filters are in an arms race • Irony: Spammer tricks often make spam easier to filter • Who spells Viagra V1@GGR@?
The Spam Zeitgeist (1) • Spammers realize that spam filters spot their tricks, so they are trying… • Short plain text emails with a URL • Anti-spammer response: URL blacklist • Spammer response: use redirector • Anti-spammer response: follow redirector • Spammer response: use Geocities with complex page that reloads using encoded Javascript
The Spam Zeitgeist (2) • Spammers realize that spam filters read their mail… • Spammer response: send an image instead of text • Anti-spammer response: checksum the images • Spammer response: make random modification of image and number of images • Anti-spammer response: perform OCR on images • Spammer response: add random noise to images
Example: Hypertextus Interruptus • A once popular trick that has fallen out of favor • Use HTML's commenting mechanism to break up bad words • HTML comments are written <!-- comment --> and the entire sequence is ignored and not displayed. • Easy to break up a word like Viagra: V<!-- banana -->i<!-- wumpus -->a<!-- dinosaur -->g<!-- potato -->r<!-- amtrak -->a
Example: Invisible Ink (1) • Hide lots of good words using white font on a white background • Before:Buy Viagra Now!Please see the attached spread sheet for our current sales forecast
Example: Invisible Ink (1) • Use HTML <font color=xyz> tag to make the good words disappear • After:Buy Viagra Now!Please see the attached spread sheet for our current sales forecast <font color=white> … </font>
Example: Camouflage (1) • Like Invisible Ink but use slightly different colors (almost white on white) • Before:Buy Viagra Now!Please see the attached spread sheet for our current sales forecast
Example: Camouflage (2) • Like Invisible Ink but use slightly different colors (almost white on white) • Add a colored background:Buy Viagra Now!Please see the attached spread sheet for our current sales forecast <body bgcolor=#CC9900> … </body>
Example: Camouflage (3) • Like Invisible Ink but use slightly different colors (almost white on white) • Finally, color the text almost the same:Buy Viagra Now!Please see the attached spread sheet for our current sales forecast <font color=#BB8811> … </font>
Example: The Matrix • Prevent spam filter from reading the text in a spam by writing it verticallyBuyGenericViagraCheap
Example: The Matrix • Prevent spam filter from reading the text in a spam by writing it verticallyB G V Cu e i hy n a e e g a r r p i a c
Example: Catch a Wave (1) • Split a sentence into two lines and then put them back together again • Start with:Increase your sexual desire
Example: Catch a Wave (2) • Split a sentence into two lines and then put them back together again • Make two linesInc se yo se al des rea ur xu ire <tr align=bottom><td></td><td>rea</td> … <td>ire</td></tr>
Example: Catch a Wave (3) • Split a sentence into two lines and then put them back together again • Back together again:Increase your sexual desire <tr align=bottom><td rowspan=2>Inc</td><td></td> … <td></td></tr>
Example: The Rake (1) • Break up a word with random letters, then move the letters out of the way • Start with:Viagra
Example: The Rake (2) • Break up a word with random letters, then move the letters out of the way • Sprinkle in some random letters:Vxifatgyrka
Example: The Rake (3) • Break up a word with random letters, then move the letters out of the way • Mark each letter to be moved out of the way:Vxifatgyrka <span style=“float:right”>x</float>
Example: The Rake (4) • Break up a word with random letters, then move the letters out of the way • The end result:Viagra xftyk
Example: Whiter shade of pale (1) • Concatenate words using random greyed out letters • Start with:Offshore pharmacy online now
Example: Whiter shade of pale (2) • Concatenate words using random greyed out letters • Add random letters between words:OffshoreGpharmacyUonlineInow
Example: Whiter shade of pale (2) • Concatenate words using random greyed out letters • Grey or white out the letters:OffshoreGpharmacyUonlineInow <font color=lightgrey>G</font>
Image Spam • Spammers currently like image spam because… • Text based filters are getting really good • Hard to read the text in an image • URL blacklists are catching a lot of spam • Recipient is usually asked to type in a URL in the image • Hard to automatically extract that URL for blacklisting • An image is hard for a machine to interpret • Lots of latitude for obscuring the image
Will it end? • No • People buy from spam • Pew Internet Trust 2003 Survey: 7% • My 2004 Survey: 1% • But a 0.001% response rate is break even • End users are seeing less and less spam • Spam has moved from an end-user problem to a sysadmin problem