350 likes | 524 Views
BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection. Guofei Gu , Roberto Perdisci , Junjie Zhang, and Wenke Lee College of Computing, Georgia Institute of Technology USENIX Security '08 Presented by Lei Wu April 13 th , 2009.
E N D
BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection GuofeiGu, Roberto Perdisci, Junjie Zhang, and Wenke Lee College of Computing, Georgia Institute of Technology USENIX Security '08 Presented by Lei Wu April 13th, 2009
Outline • Motivation and Background • System description • Experimental analysis • Conclusion
Outline • Motivation and Background • System description • Experimental analysis • Conclusion
Motivation and Background • This paper proposes a general detection framework BotMiner that is independent of botnet Command and Control (C&C) protocol and structure, and requires no a priori knowledge of botnets
Motivation and Background • Bot • A malware instance that runs autonomously and automatically on a compromised computer (zombie) without owner’s consent • Botnet: network of bots controlled by criminals • Definition: “A coordinated group of malware instances that are controlled by a botmaster via some C&C channel” • 25% of Internet PCs are part of a botnet!
Motivation and Background • WhyBotMiner? • Traditionalmethods are not enough. Botnetscan change their C&C content (encryption, etc.), protocols (IRC, HTTP, etc.), structures (P2P, etc.), C&C servers, infection models …
Basic idea • Cluster similar communication traffic and similar malicious traffic, and performs cross cluster correlation to identify the hosts that share both similar communication patterns and similar malicious activity patterns
How does it work? • Revisit the definition of Botnet again • “A coordinated group of malware instances that are controlled by a botmaster via some C&C channel” • We need to monitor two planes • C-plane (C&C communication plane): “who is talking to whom” • A-plane (malicious activity plane): “who is doing what” • Horizontal correlation • Bots are for long-term use • Botnet: communication and activities are coordinated/similar
Outline • Motivation and Background • System description • Experimental analysis • Conclusion
Simplified Architecture A-Plane Monitor + Clustering Network Traffic Report Cross-Plane Correlation C-Plane Monitor + Clustering
A-Plane A-Plane Monitor + Clustering Network Traffic Report Cross-Plane Correlation C-Plane Monitor + Clustering
A-Plane Monitor • Log information on who is doing what • Monitor four types of malicious activities • Scanning • Spamming • Binary downloading • Exploit attempts • Based on Snort, adapt some existing intrusion detection techniques (e.g. BotHunter, PEHunter)
A-Plane Clustering • Two-layer clustering on activity logs
C-Plane A-Plane Monitor + Clustering Network Traffic Report Cross-Plane Correlation C-Plane Monitor + Clustering
C-Plane Monitor • Capture network flows and records information on who is talking to whom • Adapt an efficient network flow capture tool named fcapture, which is based on Judy library • Each flow record contains the following information: time, duration, source IP, source port, destination IP, destination port, and the number of packets and bytes transferred in both directions
C-Plane Clustering • Architecture of the C-plane clustering • First two steps are not critical, however, they can reduce the traffic workload and make the actual clustering process more efficient • In the third step, given an epoch E (typically one day), all TCP/UDP flows that shares the same protocol, source IP, destination IP and port, are aggregated into the same C-flow
Feature Extraction • Extract a number of statistical features from each C-flow and translate them into d-dimensional pattern vectors compute the discrete sample distribution of (currently) four random variables • the number of flows per hour (fph) • the number of packets per flow (ppf) • the average number of bytes per packets (bpp) • the average number of bytes per second (bps) Temporal related statistical distribution information: FPH and BPS Spatial related statistical distribution information: BPP and PPF
Feature Extraction Algorithm • Compute the overall discrete sample distribution of the random variable considering all the C-flows in the traffic for an epoch E, then describe that random variable (approximate) distribution as a vector of 13 elements. • Apply the same algorithm for all four random variables, and therefore we map each C-flow into a pattern vector of d = 52 elements
Two-step Clustering of C-flows • Why multi-step? • Coarse-grained clustering • Using reduced feature space: mean and variance of the distribution of FPH, PPF, BPP, BPS for each C-flow (2*4=8) • Efficient clustering algorithm: X-means • Fine-grained clustering • Using full feature space (13*4=52)
Cross-Plane Correlation A-Plane Monitor + Clustering Network Traffic Report Cross-Plane Correlation C-Plane Monitor + Clustering
Cross-Plane Correlation • Botnet score s(h) for every host h • h will receive a high score if it has performed multiple types of suspicious activities, and if other hosts that were clustered with h also show the same multiple types of activities • Similarity score between host hi and hj • Two hosts in the same A-clusters and in at least one common C-cluster are clustered together
Hierarchical clustering • Use the Davies-Bouldin (DB) validation index to find the best dendrogram cut, which produces the most compact and well separated clusters
Outline • Motivation and Background • System description • Experimental analysis • Conclusion
Outline • Motivation and Background • System description • Experimental analysis • Conclusion
Limitation and Discussion • Evading C-plane monitoring and clustering • Misuse whitelist • Manipulate communication patterns • Evading A-plane monitoring and clustering • Very stealthy activity • Individualize bots’ communication/activity • Evading cross-plane analysis • Extremely delayed task
Contribution • Propose a detection framework which is independent of botnet C&C protocol and structure, and requires no a priori knowledge of specific botnets • Build a prototype system based on the general detection framework, and evaluate it with multiple real-world network traces including normal traffic and several real-world botnet traces
Weakness • Offline system • Long time data collection and analysis • No incremental ability of analysis • The experiment is not convincing enough • Only shows the system performance on day-2, what about the other days? • Not a real “real world experiment”
Improvement • Fast detection and online analysis • More efficient clustering, more robust features • More experiments in different and real network environment
Reference • Sides of the paper in USENIX Security’08 • http://faculty.cs.tamu.edu/guofei/paper/botMiner-Security08-slides.pdf • Sad Planet, Kayak Adventure. Botnets on the Rampage • http://birdhouse.org/blog/2006/11/16/botnets-on-the-rampage/ • Beware of Potential ConfickorBotNet Chaos • http://thejunction.net/2009/03/25/april-1st-beware-of-potential-botnet-chaos/ • Oracle Data Mining Mining Techniques and Algorithms • http://www.oracle.com/technology/products/bi/odm/odm_techniques_algorithms.html