1 / 12

Detecting Spammers with SNARE: Spatio -temporal Network-level Automatic Reputation Engine

Detecting Spammers with SNARE: Spatio -temporal Network-level Automatic Reputation Engine. Shuang Hao , Nadeem Ahmed Syed, Nick Feamster , Alexander G. Gray, Sven Krasser. Klevis Luli. SNARE Overview.

mauli
Download Presentation

Detecting Spammers with SNARE: Spatio -temporal Network-level Automatic Reputation Engine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Detecting Spammers with SNARE:Spatio-temporal Network-level Automatic Reputation Engine ShuangHao, Nadeem Ahmed Syed, Nick Feamster, Alexander G. Gray, Sven Krasser Klevis Luli

  2. SNARE Overview • Sender reputation system that automatically classifies email senders based on various network-level features. • No content checking, lightweight • Not blacklisting • Features that help distinguish spammers from legitimate senders • Automated Reputation Engine • Implementation • Evasion and Limitations • Evaluation • Future Work

  3. Single-packet features • Noprevious history from the IP address, only a single packet from the IP address in question • Receiver does not need to accept connection request • geographic distance: spam tends to travel longer geographic distances between sender and receiver • sender neighborhood density: a cluster of senders in a small address space could be a botnet • probability ratio of spam to ham (genuine email) at the time of day the IP packet arrives: legitimate email follows a certain trend • AS number of sender: more reliable than the IP address, a large amount of spam comes from a small amount of ASes • open ports on sender: legitimate mail senders usually provide certain services so they listen on more than one port.

  4. Single-packet features

  5. Single-header and single-message features • Collected after looking at SMTP headers or messages • Receiver accepts connection • Provide increased confidence • Number of recipients in To field: Spam usually has more recipients than ham • Length of message: Spam tends to be short and less random • Constructed if some history from an IP is available • By summarizing behavior over multiple messages and over time, these aggregate features may yield a more reliable prediction. • geodesic distance between the sender and recipient, • number of recipients in the “To” field of the SMTP header • message body length in bytes • Comes at the cost of increased latency because messages need to be collected first Aggregate features

  6. Automated reputation engine • RuleFit supervised learning algorithm • x for input variables, f(x) for “base learner” functions • Rules in a decision tree used as “base learners” • Automatically classifies email after being trained • Can evaluate relative importance of features • Input variables that frequently appear in important rules or basic functions are deemed more relevant.

  7. Implementation Other scenarios: • A standalone DNS-based Blacklist • A first-pass filter before existing mechanisms

  8. Evasion and Limitations • AS numbers: Robust to indicate malicious hosts, not easy for spammers to move mail servers or the bot armies to a different AS • Message length: Knowing that SNARE checks the length of message, a spammer might start to randomize the lengths of his emails. • Nearest neighbor: Hard to modify. However, the botnet controller could direct bots on the same subnet to target different sets of destinations. • Open ports: Legitimate hosts could be blocking port scans. • Geodesic distance: Spammer could modify bots to send to closer recipients. • Number of recipient: Spammer could send to individual hosts one by one • Time of day: Botnets could send email during legitimate peak hours to look legitimate. • Authors main argument: Above changes are difficult or would limit flexibility and efficency of botnet. • Other Limitations: Scaling, Web-based email accounts

  9. Evaluation • 14 days of data, October 22, 2007 to November 4, 2007 • Data trace is divided into two parts: • The first half is used for measurement study • The other half is used to evaluate SNARE’s performance • RuleFit trained with 1 million randomly sampled messages from each day with (5% to 95% spam to ham ratio)

  10. Evaluation

  11. Future Work • Incorporating temporal features into the classification engine • Making SNARE more evasion-resistant • Refining the whitelist

  12. Thank you!

More Related