Spam defined, spam, spam and spam. Our working definition of spam: Unsolicited Bulk Email Unsolicited Recipients did not request mail from sender Bulk Single message sent to many recipients

  2. Spam defined, spam, spam and spam • Our working definition of spam: Unsolicited Bulk Email • Unsolicited • Recipients did not request mail from sender • Bulk • Single message sent to many recipients • Spam is about consent, not content • Spam is not just commercial or advertising email, but any unwanted bulk delivery

  3. Spam vs. legitimate bulk email • Solicited bulk emailing is generally accepted • Recipients request subscription directly • Subscription is confirmed so parties cannot add unwilling recipients • Subscriber must reply to confirmation message sent to subscription address to complete subscription • Subscribers must be able to change subscription addresses or unsubscribe at will • Human (or automated) list management removes invalid or undeliverable addresses

  4. The problem of spam • Spam costs are primarily borne by recipients • Recipients pay for services to receive email • Compare with postal bulk mail, where senders assume much more of the cost • The cost of sending one email is not much different than the cost of sending 1,000 emails • Low cost of sending spam means less resistance to doing it • Disincentives are mainly social, not financial

  5. The problem of spam, cont. • Spam is becoming a significant fraction of all email processed • For November 6: • Gladstone delivers about 36,000 messages, rejects about 17,000 spams • Darkwing delivers about 35,000 messages, rejects about 4700 spams • Yes, I know many of you probably still saw spam anyway • This was not a particularly bad day; we have had days where spam rejection exceeded normal mail volume

  6. Top N spammers, November 6

  7. Why do spammers spam? • Many spammers claim to be engaging in commercial activity, but few make money • Unfortunately, they also don't lose enough money to discourage them • Some profit by selling 'email advertising services' to naïve customers • Who then find out their 'email advertising campaign' has thoroughly cheesed off their audience • The sociopathic theory of spammers • Mainly they do it because they get off on it

  8. Background: SMTP protocol • 'Simple' Mail Transfer Protocol • Standardized since about 1982 • How email is transferred between hosts • Analogies with postal mail • 'envelope' vs. message headers/body • 'envelope sender' corresponds to return address • 'envelope recipient' corresponds to recipient address • Addresses in headers do not have to correspond to envelope addresses

  9. SMTP protocol example

  10. Message headers

  11. Characteristics of spam email • Forged headers are the norm • Easy for spammer to change • Tends to obfuscate origin for naïve readers • Common avenues of spam delivery • Spam-for-hire firms with specific network locations • ISP modem or DSL pools • Unsecured SMTP relays • Unsecured HTTP proxies

  12. Why is spam hard to block? • Spammer can control envelope sender, most message contents • Spam-for-hire outfits are always registering new domains • Spammers change ISPs (usually because they're kicked off one and move to another) • Spammers are exploiting an unknown (maybe unknowable) pool of open SMTP relays and HTTP/SOCKS proxies

  13. What can we do to block spam? • Reject sender domains commonly used by spammers • Reject connections from IP address ranges under control of spammers, or of spam-friendly ISPs • Use available databases of IP addresses of open SMTP relays, open HTTP/SOCKS proxies • Some checks for valid DNS, valid header syntax • Content filtering

  14. Details of local spam blocking • Sendmail /etc/mail/access • Primarily a database of domains or specific sender addresses that are rejected if they appear in the domain name of the connecting client or in the envelope sender • cyberpromo.com REJECT rejects sanford@cyberpromo.com, webmaster@www.cyberpromo.com, etc. • friend@public.com REJECT rejects only that specific adddress • Currently ~2300 entries on darkwing/gladstone

  15. Details of local spam blocking • /etc/hosts.deny (TCP wrappers) • Used primarily to block specific IP address ranges • Could be done in /etc/mail/access, but we have a mechanism to export this to VMS PMDF • Hoping to phase out in favor of DNS blacklists • Example: • sendmail: .cyberpromo.com 192.168.1.

  16. DNS blacklists • Special use of name server software to maintain databases of IP addresses for spam rejection • IP address is looked up under a specific zone (domain); a positive result for the lookup indicates the IP address is present in the database • Example lookups for 'spamdb.org' zone: • => (IP is considered a spam source) • => not found (IP is not)

  17. DNS blacklists used locally • MAPS RBL+ (http://mail-abuse.org/) • rbl-plus.mail-abuse.org is lookup zone • Combination of blacklists: • RBL - mainly 'career spammers' • DUL - ISP dialup IPs, mainly contributed by the ISPs themselves • RSS - open SMTP relays • Available by subscription only, we negotiated a campus-wide subscription • DUL and RSS most effective parts

  18. DNS blacklists used locally, cont. • Spamhaus SBL (http://spamhaus.org/SBL/) • sbl.spamhaus.org is lookup zone • Free for all to use (very large sites encouraged to mirror zone data) • Very extensive listing of 'career' spammers, more effective than RBL

  19. DNS blacklists used locally, cont. • Blitzed OPM (http://blitzed.org/opm/) • opm.blitzed.org is lookup zone • Lists open HTTP/SOCKS proxies • Osirusoft DNSBL (http://relays.osirusoft.com) • relays.osirusoft.com is lookup zone • Combines many other blacklists • We currently use only the open HTTP/SOCKS proxies subset (lookups return

  20. Avoiding 'collateral damage' • DNS blacklists chosen for consistency and stability of policies • Local blocks are based on user reports • Mail is always either delivered or bounced (returned to sender) • We avoid blocking domains sending apparently legitimate mail, even if some mail is unsolicited • Persistent spam and unresponsive mangement can overcome our reluctance

  21. Content filtering • I'm not a big fan • Tends to be complicated and difficult to predict • Examines message data entirely under spammer control, so it's easy for spammers to elude • Content criteria vary from person to person, so difficult to implement well on a system-wide level • However, can catch spam that eludes previously-described blocking methods

  22. Content filtering with spamassassin • Spamassassin is a (rather large) Perl script that applies many content checks to messages • Invoked from user .procmailrc at delivery time • :0fw| /usr/local/bin/spamasassin -P:0:* ^X-Spam-Status: Yescaughtspam • Works pretty well, but has noticeable false positive rate • User customization and 'whitelisting' is recommended

