110 likes | 229 Views
Spam: An Analysis of Spam Filters. Joe Chiarella Jason O’Brien. Advisors: Professor Wills and Professor Claypool. Project Goals. To analyze the effectiveness of different kinds of spam filters. Focused on SpamAssassin and Bogofilter. SpamAssassin. Rule-based filter – over 400 rules.
E N D
Spam: An Analysis of Spam Filters Joe Chiarella Jason O’Brien Advisors: Professor Wills and Professor Claypool
Project Goals • To analyze the effectiveness of different kinds of spam filters. • Focused on SpamAssassin and Bogofilter
SpamAssassin • Rule-based filter – over 400 rules. • Each Rule has an associated weight. • Score of an email is sum of weights across all matching rules. • User adjustable threshold.
Bogofilter • Bayesian filter. • Calculates probability that an email is spam using past email. • Looks at frequency of words (not order of words). • Accuracy should improve over time.
Data Collection • Email collected from students, professors, small business employees, and free email accounts. • 4626 ham emails, 5010 spam emails, separated into ham and spam mailboxes for each user.
Methodology • Compared accuracy of SpamAssassin and Bogofilter for each user’s email. • Tested same number of ham emails and spam emails from each user. • Ignored results from first 50 emails to allow Bogofilter to learn.
Comparison of Bogofilter and SpamAssassin on Ham CP = Company Person PR = Professor ST = Student FE = Free Email
Comparison of Bogofilter and SpamAssassin on Spam CP = Company Person PR = Professor ST = Student FE = Free Email
Conclusion • Bogofilter and SpamAssassin effectiveness depend greatly on the user. • Neither filter outperformed the other in all cases. • Filtering Spam is hard.