200 likes | 374 Views
Fighting Spam. Enterprise Spam Filtering Using Open Source Tools. Introduction. Newsflash: SPAM is a problem SRJC: 60-80% of mail received is Spam! Commercial Solutions exist, but are expensive Open Source tools are a powerful alternative. Tonight’s Agenda. SpamAssassin Overview
E N D
Fighting Spam Enterprise Spam Filtering Using Open Source Tools
Introduction • Newsflash: SPAM is a problem • SRJC: 60-80% of mail received is Spam! • Commercial Solutions exist, but are expensive • Open Source tools are a powerful alternative
Tonight’s Agenda • SpamAssassin Overview • Additional Spam Rules (S.A.R.E.) • Integrating with Multiple Mail Servers • Bayesian Filtering
SpamAssassin – How It Works • Uses the combined score from multiple types of checks to determine if a given message is spam. • Header tests • Body phrase tests • Bayesian filtering • Automatic address whitelist/blacklist • Manual address whitelist/blacklist • Collaborative spam identification databases (DCC, Pyzor, Razor2) • DNS Blocklists ( "RBLs" ) • Character sets and locales • Even though any one of these tests might, by themselves, mis-identify a Ham or Spam, their combined score is terribly difficult to fool.
SpamAssassin - Advantages • Wide-spectrum of different tests • Open Source and Free! • Flexible – works with many platforms and servers • Easy Configuration
SpamAssassin Rules Emporium • http://rulesemporium.com/ • Popular Repository for Third Party SpamAssassin Rules • “Actively” Updated between SpamAssassin releases
SARE Usage Guidelines • Just download rules into SpamAssassin directory (i.e.: /etc/spamassassin) • Restart daemon if necessary • Most Popular Rules have “levels” (i.e.: 0 = conservative, 3 = aggressive) • Choose Rules you use carefully!
Rules Du Jour • http://www.exit0.us/index.php?pagename=RulesDuJour • Automates updating, downloading and installation of most popular SARE rules
Rules Du Jour • Install script in $PATH (i.e.: /usr/local/sbin) and make executable • Create a blank configuration file at /etc/rulesdujour/config • Add a TRUSTED_RULESETS line to your config file that contains the names of the rulesets you chose. i.e.: • TRUSTED_RULESETS="SARE_ADULT SARE_OBFU0 SARE_OBFU1 SARE_URI0 SARE_URI1" • Configure any local settings. Examples below: • SA_DIR="/etc/mail/spamassassin" • MAIL_ADDRESS="administrator@example.com" • SA_RESTART="killall -HUP spamd" • Run this script periodically (manually or via crontab)
SpamAssassin Serving Multiple Servers • Problem: • How do you keep multiple mail servers syncronized? • Spam checking adds load to mail server
SpamAssassin Serving Multiple Servers • Solution: Use a single machine to manage spam sitewide! • Logs, Configuration unified on a single machine
SA/multi-server – set up server • Server must be running SpamAssassin as a daemon (spamd -d) • Server must accept outside connections (i.e.: spamd –A 127.0.0.1,192.168.1.10,192.168.1.11) • Make sure server can listen to port 783 (spamd’s default port)
SA/multi-server – set up client • Use “spamc” command instead of “spamassassin” • Use switch for remote server: spamc -d 192.168.1.10 , and so forth … • Test: • spamc –d my.server.net < /path/to/sample/email
Bayesian Filtering - Introduction • “Bayesian Filtering uses statistics from previously-classified messages to estimate the likelihood that a particular message is spam.”* • “This likelihood estimate is converted to a (possibly negative) weight which is added to the ad hoc spamminess score.”* • *GORDON V. CORMACK and THOMAS R. LYNAM, University of Waterloo
Bayes – Getting Started • Enable Bayes in Config: use_bayes 1 • Put aside space for Bayes DB (either file-based or SQL) • bayes_path /var/local/spamassassin/bayes • or • bayes_store_module Mail::SpamAssassin::BayesStore::SQL
Bayes – Getting Started • Feed Bayes “ham” and “spam” • You MUST feed it samples of good and bad messages to start! • At least 200 samples of each, but use as much as possible • sa-learn --spam --dir /path/to/directory/full/of/spam/msgs • sa-learn --ham --dir /path/to/directory/full/of/ham/msgs
Bayes – Enhancing • Enable automated learning: • bayes_auto_learn 1 • bayes_auto_learn_threshold_nonspam 0.1 • bayes_auto_learn_threshold_spam 6.0 • “Teach” Bayes • Create mailbox for “ham” and “spam” and scan periodically • Note: “Resend” email, don’t forward! • You can’t overtrain the Bayes database!
Bayes – Enhancing • Give more “weight” to Bayesian Results • score BAYES_00 -4 • score BAYES_05 -2 • score BAYES_95 6 • score BAYES_99 9
Conclusion • World-class Spam Prevention is Possible with Freely Available Tools! • SRJC Stats: • Process 30,000 – 60,000 messages per day with one dual-processor server • Most messages scanned < 10 seconds ( < 1 without network tests) • < 0.007% false positives/negatives