Netflow and Botnets

Netflow and Botnets Steven M. Bellovin Columbia University smb

Hypothesis • Most hosts are either clients or servers • P2P traffic is an exception • Bots talk to other bots and thus to command and control node • By looking for unusual traffic flows – client-to-client traffic that isn’t P2P – we can find bots smb

Methodology • Use Netflow data to identify clients and servers • Classify nodes as clients or servers • Build a traffic matrix from the data to see which clients talk to which other clients • Exclude P2P traffic, which is generally identifiable based on flow size smb

Netflow • Originally from Cisco; now implemented by most router vendors • Also an IETF “Proposed Standard” • Records “flow information” – src/dst pairs (addresses and port numbers), length, timing, etc. – for “connections” through a given router • Intended for accounting and for traffic engineering smb

Problems with Netflow • Flows are unidirectional; need two records for complete picture • This is a consequence of Internet topology; most inter-ISP connections follow asymmetric paths • Routers often deliver sampled data; can miss flow start/end packets • Does not give unambiguous indication of client versus server smb

Strategy • Build tools at Columbia • Easy access to machines and data • Use existing archive of CU netflow data • Unclear if there are botnets present; get classification right first • Get other netflow archives (e.g., from predict.org) • Bring nominally-working code to AT&T to experiment with large-scale datasets • Compare with previous results from AT&T as check on correctness smb

Node Classification • Must use heuristics • Flag field in netflow data doesn’t show client vs. server • Timestamp not useful because of sampling • Current strategy: look at port number distribution • Clients usually use ports 48K-64K • Considering using node degree • But – problems with low-activity hosts? smb

Classification is Hard • Simple heuristics have not been satisfactory • Building visualization tools to help us understand the data smb

Client: Port Number by Volume smb

Client: Port Number Scatter Plot smb

Server: Port Number by Volume smb

Server: Port Number Scatter Plot smb

Ambiguous Host smb

Ambiguous Host Scatter Plot Is this the sort of host we’re looking for? smb

Current Status • Have basic tools built • Working with visualization tools to understand the data • Next steps: • Refine classification algorithms • Confirm analysis of bots in sample data • Try tools on larger dataset smb

Netflow and Botnets

Netflow and Botnets

Presentation Transcript

BOTNETS

Botnets

Botnets

Botnets

Botnets

Botnets

Botnets

Botnets and Applications

NetFlow

Netflow

PI-NetFlow and PacketCapture

Botnets

Netflow

Botnets

Botnets

Botnets

Botnets

Botnets

Botnets

Botnets