BotCop: An Online Botnet Traffic Classifier

BotCop: An Online Botnet Traffic Classifier 鍾錫山 Jan. 4, 2010

Reference • Wei Lu, Mahbod Tavallaee, Goaletsa Rammidi, Ali A. Ghorbani, "BotCop: An Online Botnet Traffic Classifier," cnsr, pp.70-77, 2009 Seventh Annual Communication Networks and Services Research Conference, 2009

Outline • Introduction • Traffic classification • Botnet detection • Experimental evaluation • Conclusions

Introduction • Honeypots: To capture malware, understand the basic behavior of botnets, and create bot binaries or botnet signatures. • Based on the existing botnets and provides no solution for the new botnets. • Automatically detect the botnets: • (1) passive anomaly analysis. • (2) traffic classification.

Hierarchical Framework • In the higher level all unknown network traffic are labeled and classified into different network application communities. • P2P, HTTP Web, Chat, DataTransfer, Online Games, Mail Communication, Multimedia(streaming and VoIP) and Remote Access. • In the lower level focusing on each application community, we investigate and apply the temporal-frequent characteristics of network flows to differentiate the malicious botnet behavior from the normal application traffic.

Traffic Classification • We first model and generate signatures for more than 470 applications according to port numbers and protocol specifications of these applications. • Second, concentrating on unknown flows that cannot be identified by signatures, we investigate their temporal-frequent characteristics in order to differentiate them into the already labeled applications based on a decision tree. • Fred-eZone , a free WiFi for Fredericton, Canada.

Signatures Based Classifier • For most applications, their initial protocol handshake steps are usually different and thus can be used for classification.

Decision Tree Based Classifier • A general result is that about 40% flows cannot be classified by the current payload signatures based classification method. • Extend n-gram frequency into a temporal domain. • Generate a set of 256-dimentional vector representing the temporal-frequent characteristics of the 256 ASCII binary bytes on the payload over a predefined time interval. • The n-gram (i.e. n = 1 in particular) over a one second time interval for both source flow payload and destination flow

Temporal-frequent metric for source flow payload of BitTorrent application. Temporal-frequent metric for source flow payload of LimeWire application.

Temporal-frequent metric for source flow payload of HTTPWeb application. Temporal-frequent metric for source flow payload of SecureWeb application.

Profiling Applications • We denote the 256-dimensional n-gram bytedistribution as a vector . • : The frequency of the ASCII character on the flow payload over a time window . • Given n historical known flows for each specific application, we define a n× 256 matrix, , for profiling applications,

A Typical Decision Tree

Botnet Detection • Botnets behavior: • Response time. • Synchronized.

Botnet Detection Approach • A set of N data objects , where . • Initialization: each cluster contains only one data instance. • Repeat: find the closest pair of clusters and then merge them into a single cluster. • Until: clusters number = 2

Experimental evaluation • The botnet traffic is collected on a honeypot deployed on a real network, aggregated them into 243 flows. • Traffic trace collected over 2 days are used for training and the realtime traffic flows collect on the 3rd day are used for testing. • The size of input data for training decision tree is 11000× 256. • 11 typical applications belonging to 8 typical application groups.

Applications in training dataset

Distribution of "unknown" application flows • More than 90,000 flows are collected over the testing day and been identified as unknown.

Source Flow Based Decision Tree Classifier Total number of flows correctly indentified: 82983 89.4%

Destination Flow Based Decision Tree Classifier Total number of flows correctly indentified: 85995 92.6%

IRC Application Communities

Conclusions • Unknown applications on the current network are firstly classified into different application communities. • Then focusing on each application community. • A temporal-frequent characteristic. • How to evaluate the approach on the P2P community and measure its performance on P2P based botnets?

BotCop: An Online Botnet Traffic Classifier