150 likes | 159 Views
This paper presents a wide-scale botnet detection and characterization approach using anomaly-based passive analysis algorithms. The system can detect IRC botnet controllers running on any random port without the need for known signatures or captured binaries.
E N D
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin
Introduction • The Master host is the computer used by the perpetrator and is used to issue commands that are relayed to the bots via the controller (often IRC servers).
Contributions • The development of an anomaly-based passive analysis algorithm that is able to detect IRC botnet controllers. • Achieving less than 2% false-positive rates. • Able to detect IRC botnet controllers running on any random port without the need for known signatures or captured binaries.
Data Collection • Transport layer flow summary data are used instead of packet-level analysis • Reduces some privacy protection concerns. • Reduces the amount of data to be processed. • Scalable to large networks, since all network devices can generate flow data. • Flow record data are collected from a large number of geographically and end-point diverse circuits to a central location.
Detection of Botnet Controllers • Aggregation of trigger events, identification of hosts with suspicious behaviour, and selection of flows. • Identification of Candidate Controller Conversations. • Analysis of Candidate Controller Conversation records. • Validation of Controllers.
Detection of Botnet Controllers • Reports of suspicious host activities are generated by internal upstream systems • Aggregate the trigger events • Search and fetch the flow records where the set of suspected hosts appear • Aggregation of trigger events, identification of hosts with suspicious behaviour, and selection of flows.
Detection of Botnet Controllers • Identification of Candidate Controller Conversations. • Search flow records • Identify connections to typical IRC ports (e.g.6667,6668). • Identify connections to hub servers/ports periodically. • Identify connections to servers with similarity to a flow model for IRC traffic that represent typical command and control activity.
Detection of Botnet Controllers • Analysis of Candidate Controller Conversation records. • Calculation of the number of unique suspected bots for a given remote server address/port. • Allow to focus on the larger botnets. • Calculation of the distances between the traffic to remote server ports and the model traffic Ns=4 is the number of statistics Nm=3 is the number of metrics (flows-per-address, packets-per-flow and bytes-per-packet) Xij observed traffic values of statistic j of metric i Mijmodel traffic values of statistic j of metric i
Detection of Botnet Controllers • Analysis of Candidate Controller Conversation records. • Calculation of a heuristics score for a server address/port pairs that remain candidates for previous conditions (a) and (b). • Idle clients generate flow records that have certain patterns (IRC Ping-Pong messages). • Server uses both TCP and UDP on the suspect port. • Server appears to be serving significant p2p traffic (i.e. it has multiple peers on multiple service ports).
Detection of Botnet Controllers • Validation of Controllers • Correlation with other available data sources (e.g. honeypot based detection). • Coordination with a customer for validation and mitigation. • Validation of domain names associated with services.
Characterization of Botnets • Objective: Classify the activities of the bots in the presence of background noise traffic • Select the hosts we want to classify. • Examine their traffic and calculate the number of flow records to application-bound ports. • Traffic profile of a host • A vector of application-bound ports ranked by the number of flows observed.
Characterization of Botnets • Similarity of two hosts S(i,j) with vectors vi and vj • S(i,j) є [0,1] , S(i,j)=S(j,i) • Similarity increases if a port number exists in both vectors • Similarity is a strictly decreasing function of the port rank
Characterization of Botnets • Classification algorithm, given a set of hosts • Calculate the similarity for each pair of hosts and rank them with descending order. • For the pairs with similarities larger than a threshold go to next step. • For each pair of hosts, check if any of them is already grouped. • If none of the hosts in the pair is grouped, create a new group and calculate its traffic profile. • If one of the hosts is already grouped add the other host to the group. • As new hosts are identified, calculate their similarity to all of the existing groups and allocate them to the group with the highest similarity above the threshold.
Quantitative Results • 376 unique controller IP addresses have been detected between 8/2006-2/2007. • Only 5 addresses were false positives. • 6 million unique IP addresses participating in malicious botnets have been discovered between 11/2005-5/2006. • Since then, about 1 million new bots per month are discovered. • Observed botnets are very dynamic in nature • The average bot stays 2-3 days on the same botnet controller
Conclusions • Advantages of this approach • Entirely passive and invisible to operators • Scales to the largest of networks • Based on flow data analysis, which limits privacy issues • Has a false positive rate of less than 2% • Helps identify botnets that are most affecting real users (and customers) • Can detect botnets that use encrypted communications • Helps quantify size of botnets, identify and characterize their activities without joining the botnet