130 likes | 231 Views
Detecting Spammers with SNARE: Spatio -temporal Network-level Automatic Reputation Engine. Shuang Hao , Nadeem Ahmed Syed, Nick Feamster , Alexander G. Gray, Sven Krasser. Klevis Luli. SNARE Overview.
E N D
Detecting Spammers with SNARE:Spatio-temporal Network-level Automatic Reputation Engine ShuangHao, Nadeem Ahmed Syed, Nick Feamster, Alexander G. Gray, Sven Krasser Klevis Luli
SNARE Overview • Sender reputation system that automatically classifies email senders based on various network-level features. • No content checking, lightweight • Not blacklisting • Features that help distinguish spammers from legitimate senders • Automated Reputation Engine • Implementation • Evasion and Limitations • Evaluation • Future Work
Single-packet features • Noprevious history from the IP address, only a single packet from the IP address in question • Receiver does not need to accept connection request • geographic distance: spam tends to travel longer geographic distances between sender and receiver • sender neighborhood density: a cluster of senders in a small address space could be a botnet • probability ratio of spam to ham (genuine email) at the time of day the IP packet arrives: legitimate email follows a certain trend • AS number of sender: more reliable than the IP address, a large amount of spam comes from a small amount of ASes • open ports on sender: legitimate mail senders usually provide certain services so they listen on more than one port.
Single-header and single-message features • Collected after looking at SMTP headers or messages • Receiver accepts connection • Provide increased confidence • Number of recipients in To field: Spam usually has more recipients than ham • Length of message: Spam tends to be short and less random • Constructed if some history from an IP is available • By summarizing behavior over multiple messages and over time, these aggregate features may yield a more reliable prediction. • geodesic distance between the sender and recipient, • number of recipients in the “To” field of the SMTP header • message body length in bytes • Comes at the cost of increased latency because messages need to be collected first Aggregate features
Automated reputation engine • RuleFit supervised learning algorithm • x for input variables, f(x) for “base learner” functions • Rules in a decision tree used as “base learners” • Automatically classifies email after being trained • Can evaluate relative importance of features • Input variables that frequently appear in important rules or basic functions are deemed more relevant.
Implementation Other scenarios: • A standalone DNS-based Blacklist • A first-pass filter before existing mechanisms
Evasion and Limitations • AS numbers: Robust to indicate malicious hosts, not easy for spammers to move mail servers or the bot armies to a different AS • Message length: Knowing that SNARE checks the length of message, a spammer might start to randomize the lengths of his emails. • Nearest neighbor: Hard to modify. However, the botnet controller could direct bots on the same subnet to target different sets of destinations. • Open ports: Legitimate hosts could be blocking port scans. • Geodesic distance: Spammer could modify bots to send to closer recipients. • Number of recipient: Spammer could send to individual hosts one by one • Time of day: Botnets could send email during legitimate peak hours to look legitimate. • Authors main argument: Above changes are difficult or would limit flexibility and efficency of botnet. • Other Limitations: Scaling, Web-based email accounts
Evaluation • 14 days of data, October 22, 2007 to November 4, 2007 • Data trace is divided into two parts: • The first half is used for measurement study • The other half is used to evaluate SNARE’s performance • RuleFit trained with 1 million randomly sampled messages from each day with (5% to 95% spam to ham ratio)
Future Work • Incorporating temporal features into the classification engine • Making SNARE more evasion-resistant • Refining the whitelist