240 likes | 572 Views
BotFinder : Finding Bots in Network Traffic Without Deep Packet Inspection. F. Tegeler, X. Fu (U Goe ), G. Vigna, C. Kruegel (UCSB). Motivation. Sophisticated type of malware: Bots Multiple bots under single control botnet
E N D
BotFinder: Finding Bots in Network Traffic Without Deep Packet Inspection F. Tegeler, X. Fu (U Goe), G. Vigna, C. Kruegel (UCSB)
Motivation • Sophisticated type of malware: Bots • Multiple bots under single control botnet • Distinct characteristics: command and control (C&C) channel • Threats raised by bots: • Spam • Information theft (e.g., credit card data) • Identity theft • Click fraud • Distributed denial of service attacks (DDoS) C&C Victim hosts $2M-$600M revenue estimated for single botnet CoNEXT 2012
Challenge • Complementary approach: Network based • Vertical correlation (single end host) (Rishi, BotHunter, Wurzinger et al., …) • Typical behavior (SPAM, DDos traffic) • Anomaly detection (Giroire et al.) • Packet analysis: HTTP structure, payloads, typical signatures • Horizontal correlation (multiple end hosts) (BotSniffer, BotMiner, TAMD…): • Two or more hosts do the same malicious stuff • How to detect bot infections? • Classically: End host – Anti Virus Scanner • But: Requires installation on every machine CoNEXT 2012
Challenge and Solution Approach • Existing vertical: Typically relies on scanning, spam, DDoS traffic and requires packet inspection. • Existing horizontal: Requires multiple hosts in single domain to be infected. Also triggered by noisy activity (e.g., BotMiner) • Contribution: Vertical detection of singlebot infections without packet inspection! • Botmaster establishes C&C connections frequently to disseminate orders. C&C connections show patterns. • Use these statistical properties of C&C communication! Core assumption: Periodic behavior! CoNEXT 2012
Methodology • Basic machine learning approach: • Learn about bot behavior: • Training phase (a) • Use learned behavior: • Detection phase (b) • Training: • Observe malware in controlled environment • Extract flows and build traces • Perform statistical analysis to obtain “features” • Create models to describe malware CoNEXT 2012
Methodology – Detection Phase • Detection: • Obtain traffic • Perform analysis analog to training • Compare statistical features of the traffic with models • During the whole process: • No deep packet inspection! CoNEXT 2012
Methodology – Details • Analysis performed on flows • Flow is a connection from A to B: • Source IP address • Destination IP address • Source port • Destination port • Transport protocol ID • Start time • Duration of connection • Number of bytes • Number of packets This information is easy to obtain in real-world environments! Example: NetFlow CoNEXT 2012
Methodology – Details cont’d • Trace: Chronologically ordered sequence of flows. • Represents long term communication behavior! Example for two dimensions: time and duration CoNEXT 2012
Distinguishing Characteristics • Bot traffic is more regular than normal, benign traffic! The lower the bar, the more periodic. CoNEXT 2012
Methodology – Features • Use statistical features to describe trace! • Average time between two flows. • Average duration of flows. • Average number of source bytes. • Average number of destination bytes. • A Fourier transform to detect underlying communication frequencies. More robust than simple averaging. CoNEXT 2012
Methodology – Models • Example scenario: • Multiple binary versions of the samebot family generated traces • Example: time interval feature: • “Intervals of 8, 20, or 210 minutes are typical for this bot.” • Clusters with low standard deviation are trustworthy representations of malware behavior • Drop very small (one-element) clusters 912min 24min 9min 20min 7.5min 230min 8min 8.2min 20min 18min 22min 210min Cluster centroids 17min 190min Feature clustering… CoNEXT 2012
Methodology – Model Matching • Compare a trace to the clustercenters of a malware family model: • 1. If trace feature “hits” a model: • Increase scoring value based on clusterquality • 2. Take model with highest scoringvalue • 3. If scoring value > threshold: • Consider model matched • Some more math involved (quality of matching trace, clustering algorithm, minimal trace length, etc.) CoNEXT 2012
Evaluation • Method is implemented in BotFinder • Six representative malware families • Dataset LabCapture: 2.5 months of lab traffic with 60 machines • Full traffic capture – allows verificiation • Should contain benign traffic only • Dataset ISPNetflow: one month of NetFlow data from large network • Reflects 540 Terabytes of data or 150 MegaBytes(!) per second of traffic. • No ground truth but possibility to compare to blacklisted IP addresses and judgment of usability. CoNEXT 2012
Evaluation – Cross Validation • Execution: • Split the ground truth malware dataset randomly into a training set and a detection set • Mix the detection set with all traces from the LabCapture dataset • Train BotFinder on the training set • Run BotFinder against the detection set • Result summary: • 77% detection rate with low false positives (1 out of 5 million traces) Training data Training set Detection set Lab-Capture Train Detect Repeat experiment 50 times per acceptance threshold CoNEXT 2012
Evaluation – Cross Validation CoNEXT 2012
Evaluation – Comparison to BotHunter • BotHunter is an optimized Snort Intrusion Detection System. It requires packet inspection and leverages anomaly detection. • Many false positives for BotHunter, typically raised by IRC activity or binary downloads. • Detection Results: • BotFinder Detection Rate: 77.5% • BotHunter Detection Rate: 10% • BotFinder outperformed BotHunter and shows relatively high detection rates and low false positives. Experimental setup not reproducing elements crucial to BotHunter? * CoNEXT 2012 *: http://www.bothunter.net
Evaluation - ISPNetFlow • Challenging to analyze as minimal information (only internal IP ranges) is available • 542 traces (from >1 billion traces) are identified by BotFinder to be malicious • On average 14.6 alerts per day CoNEXT 2012
Evaluation ISP NetFlow • Speed is sufficient for large networks: • 3min for 15M NetFlow records (~15min of ISPNetFlow, 800MB filesize) • Processing is dominated by feature extraction • Easy to parallelize • Detailed IP address investigation of raised alarms: • Comparison of external IPs with publicly available blacklists* • Result: 56% of all IPs are known to be malicious! • The “false positives” show a large cluster of connections to Apple • With whitelisted Apple: 61% of all raised alerts connect to known malicious pages • Strong support that BotFinder works! *=rbls.org CoNEXT 2012
Bot Evolution • Botmasters may try to evade detection by changing communication patterns: • Introduction of randomized intervals • Introduction of large gaps between flows • IP or domain flux (fast changing C&C servers) • Randomization impact: • Randomizing individualfeatures does not significantly impactdetection Lower limit! CoNEXT 2012
FFT Peak Detection with Gaps CoNEXT 2012
Anti-Domain Flux • Problem: Fast C&C-Domain/IP changes • Problem: BotFinder can’t create a sufficiently long trace • Idea: • Look at each source IP and compare all connections with each other • When two connections look very similar, combine them to one! • Inherently horizontal correlation per source IP! Subtrace 1: A to C&C IP 1 Subtrace 2: A to C&C IP 2 Change of IP address Trace “breaks” CoNEXT 2012
Additional Pre-Processing • How can one check that it is working? • Split of real C&C traces and random other, long traces (from real traffic). Does BotFinder recombine them? • “Low” overhead: 85% increase in the ISPNetFlow. Large distance! Good! CoNEXT 2012
Conclusion • High detection rates - nearly 80% - with low false positives and no need for packet inspection! • BotFinder shows better results than BotHunter. • 61% of BotFinder-flagged connections in the ISPNetFlow dataset were destined to known, blacklisted host! • BotFinder is robust against potential evasion strategies. CoNEXT 2012
Questions • Thank you for your attention! • Any questions? CoNEXT 2012