290 likes | 478 Views
Botnet and Spam Detection in High-Speed Networks. Wenke Lee and Nick Feamster Georgia Tech. Overview. Problem: Botnet and Spam Detection in high-speed networks Common theme: Examine network-level properties and build classifier Two systems: BotMiner and SNARE Overview
E N D
Botnet and Spam Detection in High-Speed Networks Wenke Lee and Nick FeamsterGeorgia Tech
Overview Problem: Botnet and Spam Detection in high-speed networks Common theme: Examine network-level properties and build classifier Two systems: BotMiner and SNARE Overview Integration with SMITE architecture Current integration status and plan
BotMiner: Structure and Protocol Independent • Botnets can change their C&C content (encryption, etc.), protocols (IRC, HTTP, etc.), structures (P2P, etc.), C&C servers, infection models …
Definition of a Botnet • “A coordinated group of malwareinstances that are controlled by a botmaster via some C&C channel” • Hosts that have similar C&C-like traffic and similar malicious activities • We need to monitor two planes • C-plane (C&C communication plane): “who is talking to whom” • A-plane (malicious activity plane): “who is doing what”
BotMiner Architecture Sensors Algorithms Correlation
BotMiner C-plane Clustering • What characterizes a communication flow (C-flow) between a local host and a remote service? • <protocol, srcIP, dstIP, dstPort> • Temporal related statistical distribution information • E.g., BPS (bytes per second), FPH (flows per hour) • Spatial related statistical distribution information • E.g., BPP (bytes per packet), PPF (packets per flow)
A-plane Clustering • Capture “similar activities patterns”
Cross-plane Correlation • Botnet score s(h) for every host h • A host has higher score if it is in more activity clusters and in both activity and communication clusters • A host with a high score is a bot • Similarity score between bot host hi and hj • Two hosts in the same A-clusters and in at least one common C-cluster are clustered together • Each cluster is a bot
Integrating BotMiner and SMITE • Sensors • Feature extraction for C-Plane and A-Plane clustering • C-Flow temporal and statistical features • Counting packets and connections between each pair of endpoints: bytes per second, flows per hour, bytes per packet, packets per flow • A-Plane header and payload features • Destination IP addresses and ports, payload bytes/strings • These sensors are not specific to BotMiner
Integrating BotMiner and SMITE • Algorithms • C-plane clustering • Multi-step clustering based on statistical and temporal C-flow features • A-plane clustering • Based on activity-specific similarity measures: e.g., spread of destination IP addresses and ports, Dice’s coefficient of string similarity, and byte frequency or entropy of payload • Bot scoring and botnet clustering methods • Scoring based on participation in C-plane and A-plane clusters • Clustering based on common memberships in the C-plane and A-plane clusters
Integrating BotMiner and SMITE • Correlation • Botnet detection involves both vertical and horizontal analysis/clustering: • Vertical: what activities a host has been involved in • Bot detection • Horizontal: what other hosts have similar (vertical) behavior patterns • Botnet detection • Similar analysis can be applied to other alerts • Improve botnet detection • Understand malicious activities and plans of attacks • Measure the scale of attacks
Network-Based Spam Detection • Filter email based on how it is sent, in addition to simply what is sent. • Network-level properties are less malleable • Hosting or upstream ISP (AS number) • Membership in a botnet (spammer, hosting infrastructure) • Network location of sender and receiver • Set of target recipients
Finding the Right Features • Goal: Sender reputation from a single packet header? • Low overhead • Fast classification • In-network • Perhaps more evasion resistant • Key challenge • What features satisfy these properties and can distinguish spammers from legitimate senders?
Network-Level Features • Single-Packet • AS of sender’s IP • Distance to k nearest senders • Status of email service ports • Geodesic distance • Time of day • Single-Message • Number of recipients • Length of message • Aggregate (Multiple Message/Recipient)
Sender-Receiver Geodesic Distance 90% of legitimate messages travel 2,200 miles or less
Density of Senders in IP Space For spammers, k nearest senders are much closer in IP space
Local Time of Day at Sender Spammers “peak” at different local times of day
Other Network-Level Features • Time-of-day at sender • Upstream AS of sender • Message size (and variance) • Number of recipients (and variance)
Combining Features: RuleFit • Put features into the RuleFit classifier • 10-fold cross validation on one day of query logs from a large spam filtering appliance provider • Comparable performance to SpamHaus • Incorporating into the system can further reduce FPs • Using only network-level features • Completely automated
Benefits of Whitelisting Whitelisting top 50 ASes:False positives reduced to 0.14%
Integrating SNARE and SMITE Algorithms/Correlation Sensors
Integration with SMITE • Sensors • Extract network features from traffic • IP addresses • Combine with auxiliary data (routing, time, etc.) • Algorithms • Clustering algorithm to identify behavioral fingerprints • Learning algorithm to classify based on multiple features • Correlation • Clusters formed by aggregating sending behavior observed across multiple sensors • Various features also require input from data collected across collections of IP addresses
SMITE Integration Challenges • Sources of labeled data • SNARE requires clean sources of labeled data for training • Data collection • SNARE’s performance improves when behavior can be observed across multiple domains
SMITE Integration: Current Work • Study pipeline architecture and code • Modify flow-analyzer to dump 5-tuple flow information
SMITE Integration: Phase I • Modify flow-analyzer with SMITE team to generate 5-tuple flow information (mid-March) • Spam/scan detection, flow aggregation in BotMiner; Spam feature extraction in SNARE (end of March) • Clustering and correlation in BotMiner; Classifier in SNARE (end of April)
SMITE Integration: Phase II • Evaluate performance of BotMiner and SNARE • How many hours to process one-day of traffic, or what is the “lag” time between event and detection? • Design real-time detection algorithms • A two-tier system: off-line module output lists of suspicious hosts, and real-time module inspects all packets of these hosts; or, off-line module output clusters • Design algorithms to handle asymmetric traffic • Cluster on each direction of traffic and cross-correlate