270 likes | 288 Views
Automating Analysis of Large-Scale Botnet Probing Events. Zhichun Li, Anup Goyal, Yan Chen and Vern Paxson* Lab for Internet and Security Technology (LIST) Northwestern University * UC Berkeley / ICSI. Motivation. IPv4 Space. Botnets. Can we answer this question with
E N D
Automating Analysis of Large-Scale Botnet Probing Events Zhichun Li, Anup Goyal, Yan Chen and Vern Paxson* Lab for Internet and Security Technology (LIST) Northwestern University * UC Berkeley / ICSI
Motivation IPv4 Space Botnets Can we answer this question with only limited information observed locally in the enterprise? Enterprise Does this attack specially target us? Administrators
Motivation • Can we infer the probe strategy used by botnets? • Can we infer whether a botnet probing attack specially targets a certain network, or we are just part of a larger, indiscriminant attack? • Can we extrapolate botnet global properties given limited local information?
Agenda • Motivation • Basic framework • Discover the botnet probing strategies • Extrapolate global properties • Evaluation • Conclusions
Botnet Probing Events Big spikes of larger numbers of probers mainly caused by botnets
System Framework See the paper for subtle system details.
Agenda • Motivation • Basic framework • Discover the botnet probing strategies • Extrapolate global properties • Evaluation • Conclusions
Discover the Botnet Probing Strategies • Use statistical tests to understand probing strategies • Leverage on existing statistical tests • Monotonic trend checking: detect whether bots probe the IP space monotonically • Uniformity checking: detect whether bots scan the IP range uniformly. • Design our own • Hitlist (liveness) checking: detect whether they avoid the dark IP space • Dependency checking: do the bots scan independently or are they coordinated?
Hitlist Checking • Configure the sensor to be half darknet and half honeynet • Use metric θ= # src in darknet/ # src in honeynet. • Threshold 0.5
Agenda • Motivation • Basic framework • Discover the botnet probing strategies • Extrapolate global properties • Global scan scope, total # of bots, total # of scans, total scan rate for each bot • Evaluation • Conclusions
Extrapolate Global Properties: Basic Ideas and Validation • Observe the packet fields that change with certain patterns in continuous probes. • IPID: a packet field in IP header used for IP defragmentation • Ephemeral port number: the source port used by bots • Increment for a fixed # per scan • Validation • IPID continuity: All versions of Windows and MacOS • Ephemeral port number continuity: botnet source code study • Agobot, Phatbot, Spybot, SDbot, rxBot, etc. • Control experiments with NAT
IPID T Estimate Global Scan Rate of Each Bot • Count the IPID & ephemeral port # changes • Recover the overflow of IPID and ephemeral port number • Estimate the rate with linear regression when correlation coefficient > 0.99 • Counter overestimation: use less of the two
Extrapolate Global Scan Scope IPv4 Space Botnets boti ni=100 Total scans from boti: scan rate Ri * scan time Ti = 100*1000=100,000 Local/global ratio Aggregating multiple bots
Extrapolate Global # of Bots • Idea: similar to Mark and Recapture • Assumption: All bots have the same global scan range • Total M=4000 Bots M • First half m1=1000 • Second half m2=1000 • Observed by both m12= 250 m1 m2 M=m1*m2/m12 m12
Agenda • Motivation • Basic framework • Discover the botnet probing strategies • Extrapolate global properties • Evaluation • Conclusions
Dataset • Based on a 10 /24 honeynet in a National Lab (LBNL) • 293GB packet traces in 24 months (2006-07) • Totally observed 203 botnet probing events • Average observed #bots/event is 980. • Mainly on SMB/WINRPC, VNC, Symantec, MSSQL, HTTP, Telnet • Size of the system: 13,900 lines: Bro (6,000), Python (4,000), C++ (2,500), R (1,400)
Property Checking Results • More than 80% uniform scanning • Validate the results through visualization and find the results are highly accurate.
Extrapolation Results • Most of extrapolated global scopes are at /8 size, which means the botnets do not target the enterprise (LBNL). • Validation based with DShield data • DShield: the largest Internet alert repository • Find the /8 prefixes in DShield with sufficient source (bots) overlap with the honeynet events • Due to incompleteness of Dshield data, 12 events validated • Calculate the scan scope in each /8 based on sensor coverage ratio.
Extrapolation Validation • Define scope factor as max(DShield/Honeynet,Honeynet/DShield) 75% within 1.35 All within 1.5 CDF of the scope factor
Conclusions • Develop a set of statistical approaches to assess four properties of botnet probing strategies • Designed approaches to extrapolate the global properties of a scan event based on limited local view • Through real-world validation based on DShield, we show our scheme are promisingly accurate
Extrapolate the scope Probes observed locally Local/global ratio Estimate global probing rate Probing time window
Monotonic trend checking • Goal: detect whether the bots probe the IP space monotonically • E.g. simple sequential probing • Technique: • Mann-Kendall trend test • Intuition: check whether the aggregated sign value (sign(Ai+1-Ai)) out of the range of randomness can achieve. • When most (>80%) senders in an events follow trend we label the events follow trends
Uniformity Checking • Goal: detect whether the botnet scan the IP range uniformly. • Technique: • Chi-Square test • Intuition: put address into bins. The scan observed in each bin should be similar. • Significance level of 0.5%
Dependency Checking • Goal: Is the bots try to get out each other’s way? • Idea: account the number of address receive zero scan and comparing with confidence interval of the independent random case.