400 likes | 664 Views
A presentation by… Bodek Frak & Craig Brown. Can Spam. - OUCC 2004 -. What is Spam?. Unsolicited: You did not ask for it Commercial: Trying to sell you something (legal/illegal) Bulk: Sent in large quantities UCE: Unsolicited commercial email
E N D
A presentation by… Bodek Frak & Craig Brown Can Spam - OUCC 2004 -
What is Spam? • Unsolicited: • You did not ask for it • Commercial: • Trying to sell you something (legal/illegal) • Bulk: • Sent in large quantities • UCE: Unsolicited commercial email • UBE: Unsolicited bulk email
What is NOT Spam? • E-mails from mailing lists that a person subscribed to and does not know how to unsubscribe. • E-mails from on-line shopping companies where person was shopping. • E-mails generated by viruses mailing themselves as attachments • “Chain letters” sent to you by your friends or people you barely know….
False Positive / Negative • False Positives: legitimate e-mails that get mistakenly identified as spam. • False Negatives: spam e-mails that slip undetected past the filters. • “For most users, missing legitimate e-mail is an order of magnitude worse than receiving spam, so a filter that yields false positives is like an acne cure that carries a risk of death to the patient.”
Why? • Not regulated as much as telemarketing or paper mail. • Anonymous nature of e-mail (SMTP limitations) and Internet. • Cheap: The sheer number of spam mail sent means that even tiny response rates, reportedly 0.0001%, means junk mailers turn a profit. • Number of people using e-mail is rising.
How? • Bots searching web sites and newsgroup postings for anything with “@” sign. • Directory harvest attacks. • Some dishonest companies sell their customer data. • “Unsubscribe link” • Guessing: peter@yahoo.com, peter@hotmail.com, peters@uwindsor.ca, etc. • Viruses and worms stealing addresses from PC’s address book.
Implications • Storage and bandwidth costs • Administrative overhead • E-mail delivery delays • E-mail server performance decreased • Lost employee productivity • Frustration (keyboards/monitors)
The Battle • Legal initiatives (new laws, lawsuits)Problem: most spam comes from outside N. Am • More effective and widespread spam filters (spam business model no longer profitable)Problem: high cost, complexity • Next generation (“not-so-simple”) SMTP Problem: will take time before widely adopted • Better user awareness
Approaches • Outsource: Redirect all your incoming mail through a company that filters out the spam and delivers good mail to your users • In-house: Hardware/software spam control solution deployed at the network level (server-side) or client level • Plan B: Switch to paper and pen ….
Two Extremes • Stop spam before it reaches the user • All mail is suspect and scrutinized by the gateway/server for point of origin and content; spam is discarded. • Pros: • - Avoids wasting user’s time • - Saves internal network bandwidth and server storage, processing power • Cons: • - False positives never detected (high impact) • - User not in control
Two Extremes • Let everything in • Point of origin and content not scrutinized but users are given tools to deal with unwanted mail • Pros: • - User makes his own spam decisions • - Minimal impact of false positives • Cons: • - Lost bandwidth, storage, processing power • - User learning curve
Spam Detection • SMTP Level Rules: - DNS Lookup (connecting host, return address) - RBL: DNS real-time blackhole lists (controversial) • Content Checking Rules (score): • - SMTP headers • - Content (subject line & body) • Whitelists/Blacklists • Challenge-Response
Bayesian Filtering (Probability) • Pros: • Widely acknowledged to be the best way to catch spam • Learns all the time, and takes into account your valid e-mails (known as “ham”) • Takes a statistical approach and does not rely solely on static updates from the vendor • Cons: • My spam may not be the same as your spam • Takes time to build the spam/ham database • Adds complexity for the user
Two Strategies • All external e-mails are legitimate except those from blacklisted (blocked) addresses and/or domains and those that do not pass the rules (more common) • All external e-mails are spam except those from whitelisted (protected) addresses and/or domains (more effective)
Best Practices • Never make a purchase from an unsolicited e-mail • If you do not know the sender of an unsolicited e-mail message, delete it. • Never respond to any spam messages or click on any links in the message. • Avoid using the preview functionality of your e-mail client software.
Best Practices • When sending e-mail messages to a large number of recipients, use the blind copy (BCC) field to conceal their e-mail addresses • Never provide your e-mail address on websites, newsgroup lists or other online forums. • Never give your primary e-mail address to anyone or any site you don’t trust. • Have and use one or two secondary e-mail addresses.
Resources • Sophos “Field Guide To Spam” • http://www.sophos.com/spaminfo/explained/fieldguide.html • SpamNews http://www.petemoss.com • Inventor of Bayesian Filteringhttp://www.paulgraham.com • Bayesian Spam Filteringhttp://email.about.com/cs/bayesianfilters
The University of Windsor Problem • Currently receiving over 200,000 e-mail messages per day • In-house filters deleting up to 100,000 e- mails per day • Many more spam messages getting through • Our end users are frustrated!
Spam Solutions • We looked at 3 categories of software solutions • - E-mail server add-on • - SMTP gateway products • - Client side products • Hardware based solutions • - Complete solution that consists of both hardware and software in one package
E-Mail Server Add-On • A solution implemented on the e-mail server • Pros: • Works in conjunction with e-mail server • Vendor specific • Cons: • Vendor specific • - The University of Windsor would need at least two solutions, one for faculty/staff, the other for students
SMTP-Based Gateway Products • An independent solution usually in front ofe-mail servers • Pros: • Vendor independent • Identifies spam before it hits the e-mail servers • Reduces load on e-mail servers • Cons: • Usually more expensive, in some cases requires purchase of new hardware
Client Side Products • Installed on the user’s PC • Works with e-mail client to filter/tag suspected spam when messages are being downloaded from the mail server. • Pros: • Provides individual with complete control over spam filtering • Cons: • Possible deployment / support issues
Our Requirements • Minimal involvement of IT staff in maintaining the solution • Significantly higher rate of catching spam when compared to our current “in-house” solution - Currently identifying between 10-30% of total e-mail volume as spam • More advanced set of features and options “out of the box” • Technical support from vendor, including upgrades/updates • Ability for end users to control how their spam is dealt with • LDAP Compliant
Essential Spam Detection Techniques • Keyword search and proximity search in subject/body • Keyword search using pattern matching or heuristic analysis • Message format analysis • Statistical Analysis (Including Bayesian filtering) • Blacklist of known bad e-mail and IP addresses • Whitelist • Open proxy lists, DNS verification • Ability to filter viral attachments (not essential!)
Spam Engine Settings • Settings can be changed system-wide • Settings can be changed for groups of users • Settings can be changed for individual users
Actions on Spam • Delete the message • Quarantined in either a per-system quarantine, or a per-user quarantine • Tagged within the message header • Tagged by adding something to the subject line
Updates to the Software • User or administrator updates by editing rules • User or administrator trains spam engine through human identification of spam • Vendor keeps engine updated through periodic updates
Product Selection • Using the criteria developed by the committee, a list of products that fit the criteria was established. • These products were identified through magazine articles, newsletters, other campus solutions, and previous vendor contacts
Product Selection • A number of products were eliminated from the list • Reasons for elimination included: • - Operating System (Our in-house expertise is with UNIX/Sendmail) • - Scalability • - Lack of essential features
Finally – A Shortlist! • Spam Assassin • BrightMail • Can-It Pro • PureMessage
Decisions, Decisions … • The committee reviewed the short-listed products, viewed presentations, and tested two solutions • The solutions tested were: • - Can-It Pro - PureMessage
Live Evaluation • Each product installed on a test server • Members of committee had e-mail directed to test server • Can-It Pro • - Very low false-positive rate • - Acceptable spam capture rate out of the box • PureMessage • - High spam capture rate out of the box • - Very low false-positive rate
The Final Decision • PureMessage was chosen to be our campus wide spam solution • High spam capture rate • Low positive rate • Low maintenance • Offers all the features we required • Fit our budget
Implementation • Solution will run on four Sun Sunfire V440 Servers • 2 servers for incoming mail, 2 servers for outgoing mail, 1 server for spooling • Currently waiting for server hardware to arrive • Current implementation target: Summer 2004 • Support staff have been added to the current pilot to test
Default Configuration • Default configuration will be “Quarantine” • Other options include “Tag & Deliver” and “Opt-Out”
Default Configuration • True spam (messages > 90%) will be deleted • Possible spam (scoring 50-90%) will be quarantined • Legitimate (messages < 50%) will be delivered • A large amount of messages will never reach the messaging servers • Daily digest will be mailed each morning, detailing messages in quarantine from the previous day • End users will have on-demand access to their quarantine through a web-based interface
Default Configuration • In addition to the PureMessage AntiSpam solution, we also purchased the PureMessage Policy Manager • This will allow us to filter out unwanted attachments that may be viral – EXE, COM, BAT, PIF, SCR, etc.
Changes to E-Mail Handling • In addition to spam filtering, other initiatives have been proposed to curb the amount of spam on campus. These include: • - Only accepting mail addressed to valid addresses • - Only accepting mail with valid return addresses (domain must resolve in DNS)
Implementation Issues • Some users on other non-ITS controlled servers may not be in our LDAP database • Possible issues with the handling of ListServs • Possible issues with Shared Mailboxes