1 / 15

Netflow and Botnets

This study explores how to identify botnets by analyzing Netflow data, focusing on client-to-client traffic and excluding P2P traffic. Discusses methodology, problems, and strategies. Current status includes basic tools and visualization for data understanding.

sdorothy
Download Presentation

Netflow and Botnets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Netflow and Botnets Steven M. Bellovin Columbia University smb

  2. Hypothesis • Most hosts are either clients or servers • P2P traffic is an exception • Bots talk to other bots and thus to command and control node • By looking for unusual traffic flows – client-to-client traffic that isn’t P2P – we can find bots smb

  3. Methodology • Use Netflow data to identify clients and servers • Classify nodes as clients or servers • Build a traffic matrix from the data to see which clients talk to which other clients • Exclude P2P traffic, which is generally identifiable based on flow size smb

  4. Netflow • Originally from Cisco; now implemented by most router vendors • Also an IETF “Proposed Standard” • Records “flow information” – src/dst pairs (addresses and port numbers), length, timing, etc. – for “connections” through a given router • Intended for accounting and for traffic engineering smb

  5. Problems with Netflow • Flows are unidirectional; need two records for complete picture • This is a consequence of Internet topology; most inter-ISP connections follow asymmetric paths • Routers often deliver sampled data; can miss flow start/end packets • Does not give unambiguous indication of client versus server smb

  6. Strategy • Build tools at Columbia • Easy access to machines and data • Use existing archive of CU netflow data • Unclear if there are botnets present; get classification right first • Get other netflow archives (e.g., from predict.org) • Bring nominally-working code to AT&T to experiment with large-scale datasets • Compare with previous results from AT&T as check on correctness smb

  7. Node Classification • Must use heuristics • Flag field in netflow data doesn’t show client vs. server • Timestamp not useful because of sampling • Current strategy: look at port number distribution • Clients usually use ports 48K-64K • Considering using node degree • But – problems with low-activity hosts? smb

  8. Classification is Hard • Simple heuristics have not been satisfactory • Building visualization tools to help us understand the data smb

  9. Client: Port Number by Volume smb

  10. Client: Port Number Scatter Plot smb

  11. Server: Port Number by Volume smb

  12. Server: Port Number Scatter Plot smb

  13. Ambiguous Host smb

  14. Ambiguous Host Scatter Plot Is this the sort of host we’re looking for? smb

  15. Current Status • Have basic tools built • Working with visualization tools to understand the data • Next steps: • Refine classification algorithms • Confirm analysis of bots in sample data • Try tools on larger dataset smb

More Related