BotFinder : Finding Bots in Network Traffic Without Deep Packet Inspection

BotFinder: Finding Bots in Network Traffic Without Deep Packet Inspection F. Tegeler, X. Fu (U Goe), G. Vigna, C. Kruegel (UCSB)

Motivation • Sophisticated type of malware: Bots • Multiple bots under single control botnet • Distinct characteristics: command and control (C&C) channel • Threats raised by bots: • Spam • Information theft (e.g., credit card data) • Identity theft • Click fraud • Distributed denial of service attacks (DDoS) C&C Victim hosts $2M-$600M revenue estimated for single botnet CoNEXT 2012

Challenge • Complementary approach: Network based • Vertical correlation (single end host) (Rishi, BotHunter, Wurzinger et al., …) • Typical behavior (SPAM, DDos traffic) • Anomaly detection (Giroire et al.) • Packet analysis: HTTP structure, payloads, typical signatures • Horizontal correlation (multiple end hosts) (BotSniffer, BotMiner, TAMD…): • Two or more hosts do the same malicious stuff • How to detect bot infections? • Classically: End host – Anti Virus Scanner • But: Requires installation on every machine CoNEXT 2012

Challenge and Solution Approach • Existing vertical: Typically relies on scanning, spam, DDoS traffic and requires packet inspection. • Existing horizontal: Requires multiple hosts in single domain to be infected. Also triggered by noisy activity (e.g., BotMiner) • Contribution: Vertical detection of singlebot infections without packet inspection! • Botmaster establishes C&C connections frequently to disseminate orders. C&C connections show patterns. • Use these statistical properties of C&C communication! Core assumption: Periodic behavior! CoNEXT 2012

Methodology • Basic machine learning approach: • Learn about bot behavior: • Training phase (a) • Use learned behavior: • Detection phase (b) • Training: • Observe malware in controlled environment • Extract flows and build traces • Perform statistical analysis to obtain “features” • Create models to describe malware CoNEXT 2012

Methodology – Detection Phase • Detection: • Obtain traffic • Perform analysis analog to training • Compare statistical features of the traffic with models • During the whole process: • No deep packet inspection! CoNEXT 2012

Methodology – Details • Analysis performed on flows • Flow is a connection from A to B: • Source IP address • Destination IP address • Source port • Destination port • Transport protocol ID • Start time • Duration of connection • Number of bytes • Number of packets This information is easy to obtain in real-world environments! Example: NetFlow CoNEXT 2012

Methodology – Details cont’d • Trace: Chronologically ordered sequence of flows. • Represents long term communication behavior! Example for two dimensions: time and duration CoNEXT 2012

Distinguishing Characteristics • Bot traffic is more regular than normal, benign traffic! The lower the bar, the more periodic. CoNEXT 2012

Methodology – Features • Use statistical features to describe trace! • Average time between two flows. • Average duration of flows. • Average number of source bytes. • Average number of destination bytes. • A Fourier transform to detect underlying communication frequencies. More robust than simple averaging. CoNEXT 2012

Methodology – Models • Example scenario: • Multiple binary versions of the samebot family generated traces • Example: time interval feature: • “Intervals of 8, 20, or 210 minutes are typical for this bot.” • Clusters with low standard deviation are trustworthy representations of malware behavior • Drop very small (one-element) clusters 912min 24min 9min 20min 7.5min 230min 8min 8.2min 20min 18min 22min 210min Cluster centroids 17min 190min Feature clustering… CoNEXT 2012

Methodology – Model Matching • Compare a trace to the clustercenters of a malware family model: • 1. If trace feature “hits” a model: • Increase scoring value based on clusterquality • 2. Take model with highest scoringvalue • 3. If scoring value > threshold: • Consider model matched • Some more math involved (quality of matching trace, clustering algorithm, minimal trace length, etc.) CoNEXT 2012

Evaluation • Method is implemented in BotFinder • Six representative malware families • Dataset LabCapture: 2.5 months of lab traffic with 60 machines • Full traffic capture – allows verificiation • Should contain benign traffic only • Dataset ISPNetflow: one month of NetFlow data from large network • Reflects 540 Terabytes of data or 150 MegaBytes(!) per second of traffic. • No ground truth but possibility to compare to blacklisted IP addresses and judgment of usability. CoNEXT 2012

Evaluation – Cross Validation • Execution: • Split the ground truth malware dataset randomly into a training set and a detection set • Mix the detection set with all traces from the LabCapture dataset • Train BotFinder on the training set • Run BotFinder against the detection set • Result summary: • 77% detection rate with low false positives (1 out of 5 million traces) Training data Training set Detection set Lab-Capture Train Detect Repeat experiment 50 times per acceptance threshold CoNEXT 2012

Evaluation – Cross Validation CoNEXT 2012

Evaluation – Comparison to BotHunter • BotHunter is an optimized Snort Intrusion Detection System. It requires packet inspection and leverages anomaly detection. • Many false positives for BotHunter, typically raised by IRC activity or binary downloads. • Detection Results: • BotFinder Detection Rate: 77.5% • BotHunter Detection Rate: 10% • BotFinder outperformed BotHunter and shows relatively high detection rates and low false positives. Experimental setup not reproducing elements crucial to BotHunter? * CoNEXT 2012 *: http://www.bothunter.net

Evaluation - ISPNetFlow • Challenging to analyze as minimal information (only internal IP ranges) is available • 542 traces (from >1 billion traces) are identified by BotFinder to be malicious • On average 14.6 alerts per day CoNEXT 2012

Evaluation ISP NetFlow • Speed is sufficient for large networks: • 3min for 15M NetFlow records (~15min of ISPNetFlow, 800MB filesize) • Processing is dominated by feature extraction • Easy to parallelize • Detailed IP address investigation of raised alarms: • Comparison of external IPs with publicly available blacklists* • Result: 56% of all IPs are known to be malicious! • The “false positives” show a large cluster of connections to Apple • With whitelisted Apple: 61% of all raised alerts connect to known malicious pages • Strong support that BotFinder works! *=rbls.org CoNEXT 2012

Bot Evolution • Botmasters may try to evade detection by changing communication patterns: • Introduction of randomized intervals • Introduction of large gaps between flows • IP or domain flux (fast changing C&C servers) • Randomization impact: • Randomizing individualfeatures does not significantly impactdetection Lower limit! CoNEXT 2012

FFT Peak Detection with Gaps CoNEXT 2012

Anti-Domain Flux • Problem: Fast C&C-Domain/IP changes • Problem: BotFinder can’t create a sufficiently long trace • Idea: • Look at each source IP and compare all connections with each other • When two connections look very similar, combine them to one! • Inherently horizontal correlation per source IP! Subtrace 1: A to C&C IP 1 Subtrace 2: A to C&C IP 2 Change of IP address Trace “breaks” CoNEXT 2012

Additional Pre-Processing • How can one check that it is working? • Split of real C&C traces and random other, long traces (from real traffic). Does BotFinder recombine them? • “Low” overhead: 85% increase in the ISPNetFlow. Large distance! Good! CoNEXT 2012

Conclusion • High detection rates - nearly 80% - with low false positives and no need for packet inspection! • BotFinder shows better results than BotHunter. • 61% of BotFinder-flagged connections in the ISPNetFlow dataset were destined to known, blacklisted host! • BotFinder is robust against potential evasion strategies. CoNEXT 2012

Questions • Thank you for your attention! • Any questions? CoNEXT 2012

BotFinder : Finding Bots in Network Traffic Without Deep Packet Inspection

BotFinder : Finding Bots in Network Traffic Without Deep Packet Inspection

Presentation Transcript

Advanced Algorithms for Fast and Scalable Deep Packet Inspection

Deep Packet Inspection Which Implementation Platform?

Deep packet inspection, technical configurations and privacy

Network Forensics Deep Packet Inspection

Research Roadmap Driven by Network Benchmarking Lab (NBL): Deep Packet Inspection, Traffic Forensics, Embedded Benc

Cache-Based Scalable Deep Packet Inspection with Predictive Automaton

Space-Time Tradeoffs in Software-Based Deep Packet Inspection

An index-split Bloom filter for deep packet inspection

Space-Time Tradeoffs in Software-based Deep Packet Inspection

StriD2FA Scalable Regular Expression Matching for Deep Packet Inspection

Deep Packet Inspection with Regular Expression Matching

Fast Deep Packet Inspection with a Dual Finite Automata

A Hybrid Finite Automaton for Practical Deep Packet Inspection

Efficient Memory Utilization on Network Processors for Deep Packet Inspection

Deep Packet Inspection Market Segment to 2020

Packet Scheduling for Deep Packet Inspection on Multi-Core Architectures

Deep Packet Inspection: Where are We? CCW’08

Growth opportunities in Deep Packet Inspection and Processing Market

A hybrid finite automaton for practical deep packet inspection

Deep Packet Inspection Using Parallel Bloom Filters

Efficient Memory Utilization on Network Processors for Deep Packet Inspection

Deep Packet Inspection and Processing Market – Forecast, 2020-2027