240 likes | 258 Views
This study introduces BotFinder, a method for detecting bots in network traffic without deep packet inspection. By analyzing command and control (C&C) connections' statistical properties, BotFinder identifies malware infections with a vertical approach. Using machine learning, it extracts features from network traffic flows to model bot behavior. The methodology focuses on statistical analysis rather than packet inspection, making it an effective and non-intrusive bot detection technique. BotFinder is evaluated using representative malware families on datasets capturing lab and ISP network traffic.
E N D
BotFinder: Finding Bots in Network Traffic Without Deep Packet Inspection F. Tegeler, X. Fu (U Goe), G. Vigna, C. Kruegel (UCSB)
Motivation • Sophisticated type of malware: Bots • Multiple bots under single control botnet • Distinct characteristics: command and control (C&C) channel • Threats raised by bots: • Spam • Information theft (e.g., credit card data) • Identity theft • Click fraud • Distributed denial of service attacks (DDoS) C&C Victim hosts $2M-$600M revenue estimated for single botnet CoNEXT 2012
Challenge • Complementary approach: Network based • Vertical correlation (single end host) (Rishi, BotHunter, Wurzinger et al., …) • Typical behavior (SPAM, DDos traffic) • Anomaly detection (Giroire et al.) • Packet analysis: HTTP structure, payloads, typical signatures • Horizontal correlation (multiple end hosts) (BotSniffer, BotMiner, TAMD…): • Two or more hosts do the same malicious stuff • How to detect bot infections? • Classically: End host – Anti Virus Scanner • But: Requires installation on every machine CoNEXT 2012
Challenge and Solution Approach • Existing vertical: Typically relies on scanning, spam, DDoS traffic and requires packet inspection. • Existing horizontal: Requires multiple hosts in single domain to be infected. Also triggered by noisy activity (e.g., BotMiner) • Contribution: Vertical detection of singlebot infections without packet inspection! • Botmaster establishes C&C connections frequently to disseminate orders. C&C connections show patterns. • Use these statistical properties of C&C communication! Core assumption: Periodic behavior! CoNEXT 2012
Methodology • Basic machine learning approach: • Learn about bot behavior: • Training phase (a) • Use learned behavior: • Detection phase (b) • Training: • Observe malware in controlled environment • Extract flows and build traces • Perform statistical analysis to obtain “features” • Create models to describe malware CoNEXT 2012
Methodology – Detection Phase • Detection: • Obtain traffic • Perform analysis analog to training • Compare statistical features of the traffic with models • During the whole process: • No deep packet inspection! CoNEXT 2012
Methodology – Details • Analysis performed on flows • Flow is a connection from A to B: • Source IP address • Destination IP address • Source port • Destination port • Transport protocol ID • Start time • Duration of connection • Number of bytes • Number of packets This information is easy to obtain in real-world environments! Example: NetFlow CoNEXT 2012
Methodology – Details cont’d • Trace: Chronologically ordered sequence of flows. • Represents long term communication behavior! Example for two dimensions: time and duration CoNEXT 2012
Distinguishing Characteristics • Bot traffic is more regular than normal, benign traffic! The lower the bar, the more periodic. CoNEXT 2012
Methodology – Features • Use statistical features to describe trace! • Average time between two flows. • Average duration of flows. • Average number of source bytes. • Average number of destination bytes. • A Fourier transform to detect underlying communication frequencies. More robust than simple averaging. CoNEXT 2012
Methodology – Models • Example scenario: • Multiple binary versions of the samebot family generated traces • Example: time interval feature: • “Intervals of 8, 20, or 210 minutes are typical for this bot.” • Clusters with low standard deviation are trustworthy representations of malware behavior • Drop very small (one-element) clusters 912min 24min 9min 20min 7.5min 230min 8min 8.2min 20min 18min 22min 210min Cluster centroids 17min 190min Feature clustering… CoNEXT 2012
Methodology – Model Matching • Compare a trace to the clustercenters of a malware family model: • 1. If trace feature “hits” a model: • Increase scoring value based on clusterquality • 2. Take model with highest scoringvalue • 3. If scoring value > threshold: • Consider model matched • Some more math involved (quality of matching trace, clustering algorithm, minimal trace length, etc.) CoNEXT 2012
Evaluation • Method is implemented in BotFinder • Six representative malware families • Dataset LabCapture: 2.5 months of lab traffic with 60 machines • Full traffic capture – allows verificiation • Should contain benign traffic only • Dataset ISPNetflow: one month of NetFlow data from large network • Reflects 540 Terabytes of data or 150 MegaBytes(!) per second of traffic. • No ground truth but possibility to compare to blacklisted IP addresses and judgment of usability. CoNEXT 2012
Evaluation – Cross Validation • Execution: • Split the ground truth malware dataset randomly into a training set and a detection set • Mix the detection set with all traces from the LabCapture dataset • Train BotFinder on the training set • Run BotFinder against the detection set • Result summary: • 77% detection rate with low false positives (1 out of 5 million traces) Training data Training set Detection set Lab-Capture Train Detect Repeat experiment 50 times per acceptance threshold CoNEXT 2012
Evaluation – Cross Validation CoNEXT 2012
Evaluation – Comparison to BotHunter • BotHunter is an optimized Snort Intrusion Detection System. It requires packet inspection and leverages anomaly detection. • Many false positives for BotHunter, typically raised by IRC activity or binary downloads. • Detection Results: • BotFinder Detection Rate: 77.5% • BotHunter Detection Rate: 10% • BotFinder outperformed BotHunter and shows relatively high detection rates and low false positives. Experimental setup not reproducing elements crucial to BotHunter? * CoNEXT 2012 *: http://www.bothunter.net
Evaluation - ISPNetFlow • Challenging to analyze as minimal information (only internal IP ranges) is available • 542 traces (from >1 billion traces) are identified by BotFinder to be malicious • On average 14.6 alerts per day CoNEXT 2012
Evaluation ISP NetFlow • Speed is sufficient for large networks: • 3min for 15M NetFlow records (~15min of ISPNetFlow, 800MB filesize) • Processing is dominated by feature extraction • Easy to parallelize • Detailed IP address investigation of raised alarms: • Comparison of external IPs with publicly available blacklists* • Result: 56% of all IPs are known to be malicious! • The “false positives” show a large cluster of connections to Apple • With whitelisted Apple: 61% of all raised alerts connect to known malicious pages • Strong support that BotFinder works! *=rbls.org CoNEXT 2012
Bot Evolution • Botmasters may try to evade detection by changing communication patterns: • Introduction of randomized intervals • Introduction of large gaps between flows • IP or domain flux (fast changing C&C servers) • Randomization impact: • Randomizing individualfeatures does not significantly impactdetection Lower limit! CoNEXT 2012
FFT Peak Detection with Gaps CoNEXT 2012
Anti-Domain Flux • Problem: Fast C&C-Domain/IP changes • Problem: BotFinder can’t create a sufficiently long trace • Idea: • Look at each source IP and compare all connections with each other • When two connections look very similar, combine them to one! • Inherently horizontal correlation per source IP! Subtrace 1: A to C&C IP 1 Subtrace 2: A to C&C IP 2 Change of IP address Trace “breaks” CoNEXT 2012
Additional Pre-Processing • How can one check that it is working? • Split of real C&C traces and random other, long traces (from real traffic). Does BotFinder recombine them? • “Low” overhead: 85% increase in the ISPNetFlow. Large distance! Good! CoNEXT 2012
Conclusion • High detection rates - nearly 80% - with low false positives and no need for packet inspection! • BotFinder shows better results than BotHunter. • 61% of BotFinder-flagged connections in the ISPNetFlow dataset were destined to known, blacklisted host! • BotFinder is robust against potential evasion strategies. CoNEXT 2012
Questions • Thank you for your attention! • Any questions? CoNEXT 2012