Detecting Spammers with SNARE: Spatio -temporal Network-level Automatic Reputation Engine

Detecting Spammers with SNARE:Spatio-temporal Network-level Automatic Reputation Engine ShuangHao, Nadeem Ahmed Syed, Nick Feamster, Alexander G. Gray, Sven Krasser Klevis Luli

SNARE Overview • Sender reputation system that automatically classifies email senders based on various network-level features. • No content checking, lightweight • Not blacklisting • Features that help distinguish spammers from legitimate senders • Automated Reputation Engine • Implementation • Evasion and Limitations • Evaluation • Future Work

Single-packet features • Noprevious history from the IP address, only a single packet from the IP address in question • Receiver does not need to accept connection request • geographic distance: spam tends to travel longer geographic distances between sender and receiver • sender neighborhood density: a cluster of senders in a small address space could be a botnet • probability ratio of spam to ham (genuine email) at the time of day the IP packet arrives: legitimate email follows a certain trend • AS number of sender: more reliable than the IP address, a large amount of spam comes from a small amount of ASes • open ports on sender: legitimate mail senders usually provide certain services so they listen on more than one port.

Single-packet features

Single-header and single-message features • Collected after looking at SMTP headers or messages • Receiver accepts connection • Provide increased confidence • Number of recipients in To field: Spam usually has more recipients than ham • Length of message: Spam tends to be short and less random • Constructed if some history from an IP is available • By summarizing behavior over multiple messages and over time, these aggregate features may yield a more reliable prediction. • geodesic distance between the sender and recipient, • number of recipients in the “To” field of the SMTP header • message body length in bytes • Comes at the cost of increased latency because messages need to be collected first Aggregate features

Automated reputation engine • RuleFit supervised learning algorithm • x for input variables, f(x) for “base learner” functions • Rules in a decision tree used as “base learners” • Automatically classifies email after being trained • Can evaluate relative importance of features • Input variables that frequently appear in important rules or basic functions are deemed more relevant.

Implementation Other scenarios: • A standalone DNS-based Blacklist • A first-pass filter before existing mechanisms

Evasion and Limitations • AS numbers: Robust to indicate malicious hosts, not easy for spammers to move mail servers or the bot armies to a different AS • Message length: Knowing that SNARE checks the length of message, a spammer might start to randomize the lengths of his emails. • Nearest neighbor: Hard to modify. However, the botnet controller could direct bots on the same subnet to target different sets of destinations. • Open ports: Legitimate hosts could be blocking port scans. • Geodesic distance: Spammer could modify bots to send to closer recipients. • Number of recipient: Spammer could send to individual hosts one by one • Time of day: Botnets could send email during legitimate peak hours to look legitimate. • Authors main argument: Above changes are difficult or would limit flexibility and efficency of botnet. • Other Limitations: Scaling, Web-based email accounts

Evaluation • 14 days of data, October 22, 2007 to November 4, 2007 • Data trace is divided into two parts: • The first half is used for measurement study • The other half is used to evaluate SNARE’s performance • RuleFit trained with 1 million randomly sampled messages from each day with (5% to 95% spam to ham ratio)

Evaluation

Future Work • Incorporating temporal features into the classification engine • Making SNARE more evasion-resistant • Refining the whitelist

Thank you!

Detecting Spammers with SNARE: Spatio -temporal Network-level Automatic Reputation Engine

Detecting Spammers with SNARE: Spatio -temporal Network-level Automatic Reputation Engine

Presentation Transcript

Characterizing and Analyzing Massive Spatio-Temporal Graphs

Understanding the Network-Level Behavior of Spammers

Detecting Spammers on Social Networks

Analysing the Spatio -temporal Pattern of Farmland Change Using Landscape Metrics

A Bayesion perfusion estimation using spatio-temporal priors in ASL-MRI

Term Paper

TERM PAPER RELATED TO SPATIO TEMPORAL DATABASES

Towards efficient prospective detection of multiple spatio -temporal clusters

Understanding the Network-Level Behavior of Spammers

Spatio-temporal HAC

Moving Pattern Detection in Spatio Temporal Data Mining

BRIDGING SERVICES, INFORMATION AND DATA FOR EUROPE Spatio Temporal Geoservices Federico Prandi

Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning

Spatio-Temporal Case-Based Reasoning for Behavioral Selection

Human Action Recognition using Spatio-Temporal Classification

SNARE

Spatio-Temporal Clustering

The Spatio-Temporal Information System on Myocardial Infarction (SIST-IM)

An E fficient Spatio-Temporal Architecture for Animation Rendering

Behavior Recognition via Sparse Spatio-Temporal Features

Heuristic Formalism for Spatio-Temporal Qualitative Reasoning

Parabolic Resonance: A Route to Hamiltonian Spatio-Temporal Chaos