1 / 24

BotFinder : Finding Bots in Network Traffic Without Deep Packet Inspection

BotFinder : Finding Bots in Network Traffic Without Deep Packet Inspection. F. Tegeler, X. Fu (U Goe ), G. Vigna, C. Kruegel (UCSB). Motivation. Sophisticated type of malware: Bots Multiple bots under single control botnet

kyrene
Download Presentation

BotFinder : Finding Bots in Network Traffic Without Deep Packet Inspection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BotFinder: Finding Bots in Network Traffic Without Deep Packet Inspection F. Tegeler, X. Fu (U Goe), G. Vigna, C. Kruegel (UCSB)

  2. Motivation • Sophisticated type of malware: Bots • Multiple bots under single control botnet • Distinct characteristics: command and control (C&C) channel • Threats raised by bots: • Spam • Information theft (e.g., credit card data) • Identity theft • Click fraud • Distributed denial of service attacks (DDoS) C&C Victim hosts $2M-$600M revenue estimated for single botnet CoNEXT 2012

  3. Challenge • Complementary approach: Network based • Vertical correlation (single end host) (Rishi, BotHunter, Wurzinger et al., …) • Typical behavior (SPAM, DDos traffic) • Anomaly detection (Giroire et al.) • Packet analysis: HTTP structure, payloads, typical signatures • Horizontal correlation (multiple end hosts) (BotSniffer, BotMiner, TAMD…): • Two or more hosts do the same malicious stuff • How to detect bot infections? • Classically: End host – Anti Virus Scanner • But: Requires installation on every machine CoNEXT 2012

  4. Challenge and Solution Approach • Existing vertical: Typically relies on scanning, spam, DDoS traffic and requires packet inspection. • Existing horizontal: Requires multiple hosts in single domain to be infected. Also triggered by noisy activity (e.g., BotMiner) • Contribution: Vertical detection of singlebot infections without packet inspection! • Botmaster establishes C&C connections frequently to disseminate orders. C&C connections show patterns. • Use these statistical properties of C&C communication! Core assumption: Periodic behavior! CoNEXT 2012

  5. Methodology • Basic machine learning approach: • Learn about bot behavior: • Training phase (a) • Use learned behavior: • Detection phase (b) • Training: • Observe malware in controlled environment • Extract flows and build traces • Perform statistical analysis to obtain “features” • Create models to describe malware CoNEXT 2012

  6. Methodology – Detection Phase • Detection: • Obtain traffic • Perform analysis analog to training • Compare statistical features of the traffic with models • During the whole process: • No deep packet inspection! CoNEXT 2012

  7. Methodology – Details • Analysis performed on flows • Flow is a connection from A to B: • Source IP address • Destination IP address • Source port • Destination port • Transport protocol ID • Start time • Duration of connection • Number of bytes • Number of packets This information is easy to obtain in real-world environments! Example: NetFlow CoNEXT 2012

  8. Methodology – Details cont’d • Trace: Chronologically ordered sequence of flows. • Represents long term communication behavior! Example for two dimensions: time and duration CoNEXT 2012

  9. Distinguishing Characteristics • Bot traffic is more regular than normal, benign traffic! The lower the bar, the more periodic. CoNEXT 2012

  10. Methodology – Features • Use statistical features to describe trace! • Average time between two flows. • Average duration of flows. • Average number of source bytes. • Average number of destination bytes. • A Fourier transform to detect underlying communication frequencies. More robust than simple averaging. CoNEXT 2012

  11. Methodology – Models • Example scenario: • Multiple binary versions of the samebot family generated traces • Example: time interval feature: • “Intervals of 8, 20, or 210 minutes are typical for this bot.” • Clusters with low standard deviation are trustworthy representations of malware behavior • Drop very small (one-element) clusters 912min 24min 9min 20min 7.5min 230min 8min 8.2min 20min 18min 22min 210min Cluster centroids 17min 190min Feature clustering… CoNEXT 2012

  12. Methodology – Model Matching • Compare a trace to the clustercenters of a malware family model: • 1. If trace feature “hits” a model: • Increase scoring value based on clusterquality • 2. Take model with highest scoringvalue • 3. If scoring value > threshold: • Consider model matched • Some more math involved (quality of matching trace, clustering algorithm, minimal trace length, etc.) CoNEXT 2012

  13. Evaluation • Method is implemented in BotFinder • Six representative malware families • Dataset LabCapture: 2.5 months of lab traffic with 60 machines • Full traffic capture – allows verificiation • Should contain benign traffic only • Dataset ISPNetflow: one month of NetFlow data from large network • Reflects 540 Terabytes of data or 150 MegaBytes(!) per second of traffic. • No ground truth but possibility to compare to blacklisted IP addresses and judgment of usability. CoNEXT 2012

  14. Evaluation – Cross Validation • Execution: • Split the ground truth malware dataset randomly into a training set and a detection set • Mix the detection set with all traces from the LabCapture dataset • Train BotFinder on the training set • Run BotFinder against the detection set • Result summary: • 77% detection rate with low false positives (1 out of 5 million traces) Training data Training set Detection set Lab-Capture Train Detect Repeat experiment 50 times per acceptance threshold CoNEXT 2012

  15. Evaluation – Cross Validation CoNEXT 2012

  16. Evaluation – Comparison to BotHunter • BotHunter is an optimized Snort Intrusion Detection System. It requires packet inspection and leverages anomaly detection. • Many false positives for BotHunter, typically raised by IRC activity or binary downloads. • Detection Results: • BotFinder Detection Rate: 77.5% • BotHunter Detection Rate: 10% • BotFinder outperformed BotHunter and shows relatively high detection rates and low false positives. Experimental setup not reproducing elements crucial to BotHunter? * CoNEXT 2012 *: http://www.bothunter.net

  17. Evaluation - ISPNetFlow • Challenging to analyze as minimal information (only internal IP ranges) is available • 542 traces (from >1 billion traces) are identified by BotFinder to be malicious • On average 14.6 alerts per day CoNEXT 2012

  18. Evaluation ISP NetFlow • Speed is sufficient for large networks: • 3min for 15M NetFlow records (~15min of ISPNetFlow, 800MB filesize) • Processing is dominated by feature extraction • Easy to parallelize • Detailed IP address investigation of raised alarms: • Comparison of external IPs with publicly available blacklists* • Result: 56% of all IPs are known to be malicious! • The “false positives” show a large cluster of connections to Apple • With whitelisted Apple: 61% of all raised alerts connect to known malicious pages • Strong support that BotFinder works! *=rbls.org CoNEXT 2012

  19. Bot Evolution • Botmasters may try to evade detection by changing communication patterns: • Introduction of randomized intervals • Introduction of large gaps between flows • IP or domain flux (fast changing C&C servers) • Randomization impact: • Randomizing individualfeatures does not significantly impactdetection Lower limit! CoNEXT 2012

  20. FFT Peak Detection with Gaps CoNEXT 2012

  21. Anti-Domain Flux • Problem: Fast C&C-Domain/IP changes • Problem: BotFinder can’t create a sufficiently long trace • Idea: • Look at each source IP and compare all connections with each other • When two connections look very similar, combine them to one! • Inherently horizontal correlation per source IP! Subtrace 1: A to C&C IP 1 Subtrace 2: A to C&C IP 2 Change of IP address Trace “breaks” CoNEXT 2012

  22. Additional Pre-Processing • How can one check that it is working? • Split of real C&C traces and random other, long traces (from real traffic). Does BotFinder recombine them? • “Low” overhead: 85% increase in the ISPNetFlow. Large distance! Good! CoNEXT 2012

  23. Conclusion • High detection rates - nearly 80% - with low false positives and no need for packet inspection! • BotFinder shows better results than BotHunter. • 61% of BotFinder-flagged connections in the ISPNetFlow dataset were destined to known, blacklisted host! • BotFinder is robust against potential evasion strategies. CoNEXT 2012

  24. Questions • Thank you for your attention! • Any questions? CoNEXT 2012

More Related