1 / 35

Guofei Gu , Roberto Perdisci , Junjie Zhang, and Wenke Lee

BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection. Guofei Gu , Roberto Perdisci , Junjie Zhang, and Wenke Lee College of Computing, Georgia Institute of Technology USENIX Security '08 Presented by Lei Wu April 13 th , 2009.

corin
Download Presentation

Guofei Gu , Roberto Perdisci , Junjie Zhang, and Wenke Lee

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection GuofeiGu, Roberto Perdisci, Junjie Zhang, and Wenke Lee College of Computing, Georgia Institute of Technology USENIX Security '08 Presented by Lei Wu April 13th, 2009

  2. Outline • Motivation and Background • System description • Experimental analysis • Conclusion

  3. Outline • Motivation and Background • System description • Experimental analysis • Conclusion

  4. Motivation and Background • This paper proposes a general detection framework BotMiner that is independent of botnet Command and Control (C&C) protocol and structure, and requires no a priori knowledge of botnets

  5. Motivation and Background • Bot • A malware instance that runs autonomously and automatically on a compromised computer (zombie) without owner’s consent • Botnet: network of bots controlled by criminals • Definition: “A coordinated group of malware instances that are controlled by a botmaster via some C&C channel” • 25% of Internet PCs are part of a botnet!

  6. Motivation and Background • WhyBotMiner? • Traditionalmethods are not enough. Botnetscan change their C&C content (encryption, etc.), protocols (IRC, HTTP, etc.), structures (P2P, etc.), C&C servers, infection models …

  7. Basic idea • Cluster similar communication traffic and similar malicious traffic, and performs cross cluster correlation to identify the hosts that share both similar communication patterns and similar malicious activity patterns

  8. How does it work? • Revisit the definition of Botnet again • “A coordinated group of malware instances that are controlled by a botmaster via some C&C channel” • We need to monitor two planes • C-plane (C&C communication plane): “who is talking to whom” • A-plane (malicious activity plane): “who is doing what” • Horizontal correlation • Bots are for long-term use • Botnet: communication and activities are coordinated/similar

  9. Outline • Motivation and Background • System description • Experimental analysis • Conclusion

  10. Architecture overview

  11. Simplified Architecture A-Plane Monitor + Clustering Network Traffic Report Cross-Plane Correlation C-Plane Monitor + Clustering

  12. A-Plane A-Plane Monitor + Clustering Network Traffic Report Cross-Plane Correlation C-Plane Monitor + Clustering

  13. A-Plane Monitor • Log information on who is doing what • Monitor four types of malicious activities • Scanning • Spamming • Binary downloading • Exploit attempts • Based on Snort, adapt some existing intrusion detection techniques (e.g. BotHunter, PEHunter)

  14. A-Plane Clustering • Two-layer clustering on activity logs

  15. C-Plane A-Plane Monitor + Clustering Network Traffic Report Cross-Plane Correlation C-Plane Monitor + Clustering

  16. C-Plane Monitor • Capture network flows and records information on who is talking to whom • Adapt an efficient network flow capture tool named fcapture, which is based on Judy library • Each flow record contains the following information: time, duration, source IP, source port, destination IP, destination port, and the number of packets and bytes transferred in both directions

  17. C-Plane Clustering • Architecture of the C-plane clustering • First two steps are not critical, however, they can reduce the traffic workload and make the actual clustering process more efficient • In the third step, given an epoch E (typically one day), all TCP/UDP flows that shares the same protocol, source IP, destination IP and port, are aggregated into the same C-flow

  18. Feature Extraction • Extract a number of statistical features from each C-flow and translate them into d-dimensional pattern vectors compute the discrete sample distribution of (currently) four random variables • the number of flows per hour (fph) • the number of packets per flow (ppf) • the average number of bytes per packets (bpp) • the average number of bytes per second (bps) Temporal related statistical distribution information: FPH and BPS Spatial related statistical distribution information: BPP and PPF

  19. Feature Extraction Algorithm • Compute the overall discrete sample distribution of the random variable considering all the C-flows in the traffic for an epoch E, then describe that random variable (approximate) distribution as a vector of 13 elements. • Apply the same algorithm for all four random variables, and therefore we map each C-flow into a pattern vector of d = 52 elements

  20. Two-step Clustering of C-flows • Why multi-step? • Coarse-grained clustering • Using reduced feature space: mean and variance of the distribution of FPH, PPF, BPP, BPS for each C-flow (2*4=8) • Efficient clustering algorithm: X-means • Fine-grained clustering • Using full feature space (13*4=52)

  21. Cross-Plane Correlation A-Plane Monitor + Clustering Network Traffic Report Cross-Plane Correlation C-Plane Monitor + Clustering

  22. Cross-Plane Correlation • Botnet score s(h) for every host h • h will receive a high score if it has performed multiple types of suspicious activities, and if other hosts that were clustered with h also show the same multiple types of activities • Similarity score between host hi and hj • Two hosts in the same A-clusters and in at least one common C-cluster are clustered together

  23. Hierarchical clustering • Use the Davies-Bouldin (DB) validation index to find the best dendrogram cut, which produces the most compact and well separated clusters

  24. Outline • Motivation and Background • System description • Experimental analysis • Conclusion

  25. Data collected

  26. Results

  27. Outline • Motivation and Background • System description • Experimental analysis • Conclusion

  28. Limitation and Discussion • Evading C-plane monitoring and clustering • Misuse whitelist • Manipulate communication patterns • Evading A-plane monitoring and clustering • Very stealthy activity • Individualize bots’ communication/activity • Evading cross-plane analysis • Extremely delayed task

  29. Related Work

  30. Contribution • Propose a detection framework which is independent of botnet C&C protocol and structure, and requires no a priori knowledge of specific botnets • Build a prototype system based on the general detection framework, and evaluate it with multiple real-world network traces including normal traffic and several real-world botnet traces

  31. Weakness • Offline system • Long time data collection and analysis • No incremental ability of analysis • The experiment is not convincing enough • Only shows the system performance on day-2, what about the other days? • Not a real “real world experiment”

  32. Improvement • Fast detection and online analysis • More efficient clustering, more robust features • More experiments in different and real network environment

  33. Reference • Sides of the paper in USENIX Security’08 • http://faculty.cs.tamu.edu/guofei/paper/botMiner-Security08-slides.pdf • Sad Planet, Kayak Adventure. Botnets on the Rampage • http://birdhouse.org/blog/2006/11/16/botnets-on-the-rampage/ • Beware of Potential ConfickorBotNet Chaos • http://thejunction.net/2009/03/25/april-1st-beware-of-potential-botnet-chaos/ • Oracle Data Mining Mining Techniques and Algorithms • http://www.oracle.com/technology/products/bi/odm/odm_techniques_algorithms.html

  34. Question?

More Related