1 / 36

Reporter : Fong-Ruei , Li

BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection. Guofei Gu , Roberto Perdisci, Junjie Zhang, and Wenke Lee. In  Proceedings of the 17th USENIX Security Symposium (Security'08) , San Jose, CA, 2008. Reporter : Fong-Ruei , Li. Outline.

larue
Download Presentation

Reporter : Fong-Ruei , Li

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee. In Proceedings of the 17th USENIX Security Symposium (Security'08), San Jose, CA, 2008. Reporter : Fong-Ruei , Li Machine Learning and Bioinformatics Lab

  2. Outline • Introduction • BotMiner : Detection Framework • Problem statement • Architecture overview • Experiments • Conclusion Machine Learning and Bioinformatics Lab

  3. Introduction • Botnets are becoming one of the most serious threats to Internet security • Such as SPAM , DDoS … • Botnet is a network of compromised machines under the influence of malware code • Bot • BotMaster Machine Learning and Bioinformatics Lab

  4. Introduction • Most of the current botnet detection approaches work on • Specific botnet command and control(C&C) protocol • e.g., IRC • Structure • e.g., centralized Machine Learning and Bioinformatics Lab

  5. Introduction • Almost all of these approaches are designed for detecting botnets that use IRC or HTTP based C&C • Rish is designed to detect IRC botnets using known bot nickname patterns as signature • Another recent system is designed for detecting C&C activities with centralized servers • BotSniffer Machine Learning and Bioinformatics Lab

  6. Introduction • We need to develop a next generation botnet detection systemwhich should be independent of the C&C protocol and Structure Machine Learning and Bioinformatics Lab

  7. Problem Statement • Botnet is characterized by • C&C communication channel • Malicious activities • Botnet structure • Centralized • P2P Machine Learning and Bioinformatics Lab

  8. Assumptions • We assume that bots within the same botnet will be characterized by similar malicious activities and similar C&C communications Machine Learning and Bioinformatics Lab

  9. Architecture overview Clustering similar malicious activities Cross-checking Clustering similar communication Machine Learning and Bioinformatics Lab

  10. C-plane Monitor • The C-plane monitor captures network flows and records information on who is talking to whom • We limit our interest to TCP and UDP flows • Each flow record contains the information: • Time , Duration • IP、 Port (Source , Destination) • Number of packets • Bytes transferred Machine Learning and Bioinformatics Lab

  11. A-plane Monitor • The A-plane monitor logs information on who is doing what • It analyzes : • Outbound traffic through the monitored network • Detecting several malicious activities that the internal hosts may perform Machine Learning and Bioinformatics Lab

  12. C-plane Clustering • Be responsible for : • Reading the logsgenerated by the C-plane monitor • Finding clusters of machines that share similar communication patterns Machine Learning and Bioinformatics Lab

  13. C-plane Clustering-Flow Chart Filter out irrelevant traffic flows Machine Learning and Bioinformatics Lab

  14. C-plane Clustering-Basic Filtering • Filter Rule 1 (F1): • Ignore the flows that are not directly from internal host to external hosts • Filter Rule 2 (F2): • Ignore the flows that only contain one-way traffic Machine Learning and Bioinformatics Lab

  15. C-plane Clustering-White Listing • Filter Rule 3 (F3): • Ignore the flows whose destinations are well known as the legitimate servers • Google • Yahoo! Machine Learning and Bioinformatics Lab

  16. C-plane Clustering-Aggregation (C-Flow) • Aggregate related flows into communication flows • Given an period , all m TCP/UDP flows • share the same protocol , source IP , destination IP and port • aggregate them into the same C-flow Machine Learning and Bioinformatics Lab

  17. C-plane Clustering-Vector representation • Extract a number of statistical features from each C-flow Ci • Translate them into d-dimensional pattern vectors : Machine Learning and Bioinformatics Lab

  18. C-plane Clustering-Vector representation • Discrete sample distribution of four random variable : 1. the number of flows per hour (fph). • fph is computed by counting the number of TCP/IP flows in ci that are present for each hour of the epoch E. 2. the number of packets per flow (ppf). • ppf is computed by summing the total number of packets sent within each TCP/UDP flow in ci. Machine Learning and Bioinformatics Lab

  19. C-plane Clustering-Vector representation 3. the average number of bytes per packets (bpp). • For each TCP/UDP flow fj ci we divide the overall number of bytes transferred within fj by the number of packets sent within fj . 4. the average number of bytes per second (bps). • bps is computed as the total number of bytes transferred within each fj ci divided by the duration of fj . Machine Learning and Bioinformatics Lab

  20. C-plane Clustering-Vector representation 13 intervals as [0, k1], (k1, k2], ..., (k12,1). Quantiles : q5%, q10%, q15%, q20%, q25%, q30%, q40%, q50%, q60%, q70%, q80%, q90%, The quantile ql% of a random variable X is the value q for which P(X < q) = l%. Machine Learning and Bioinformatics Lab

  21. C-plane Clustering-Two-step clustering Machine Learning and Bioinformatics Lab

  22. C-plane Clustering-Two-step clustering • First Step : • Data set : • Using coarse-grained clustering on a reduced feature space : • d=52 features into d’=8 features • X-means clustering algorithm • The result is a set Machine Learning and Bioinformatics Lab

  23. C-plane Clustering-Two-step clustering • Second Step : • We use all the d=52 available features to represent the C-flows • X-means clustering algorithm • The result is a set Machine Learning and Bioinformatics Lab

  24. A-plane Clustering Machine Learning and Bioinformatics Lab

  25. Cross-plane Correlation • The idea is to cross-check clusters in the two plans to find out intersections that a host being part of a botnet • In order to do this , we compute botnet score s(h) for each host h Machine Learning and Bioinformatics Lab

  26. Cross-plane Correlation – botnet score Machine Learning and Bioinformatics Lab

  27. Cross-plane Correlation - similarity Machine Learning and Bioinformatics Lab

  28. Cross-plane Correlation - similarity • We define the following similarity between bots hi and hj as where : Machine Learning and Bioinformatics Lab

  29. Cross-plane Correlation - similarity Machine Learning and Bioinformatics Lab

  30. Setup and Collection • We set up traffic monitors to work on router at the campus network of the College of Computing at Georgia Tech. • We ran the C-plane and A-plane monitors for a continuous 10-day period in late 2007. Machine Learning and Bioinformatics Lab

  31. Setup and Collection Generated by executing modified bot code Generated based on Web-based C&C communication a real-world trace containing two P2P botnets Machine Learning and Bioinformatics Lab

  32. Evaluation Results Filtration Aggregation Machine Learning and Bioinformatics Lab

  33. Evaluation Results Two-step clustering Machine Learning and Bioinformatics Lab

  34. Evaluation Results Machine Learning and Bioinformatics Lab

  35. Conclusion • We proposed a novel network anomaly-base botnet detection system that is independent of the protocol and structure used by botnet Machine Learning and Bioinformatics Lab

  36. The end Thank you for listening Machine Learning and Bioinformatics Lab

More Related