260 likes | 426 Views
Detecting Botnets With Anomalous DNS Traffic. Wenke Lee and David Dagon Georgia Institute of Technology College of Computing {wenke, dagon}@cc.gatech.edu. Introduction. We summarize recent work on botnet detection and response One aspect of large sinkhole study “KarstNet” project
E N D
Detecting Botnets With Anomalous DNS Traffic Wenke Lee and David Dagon Georgia Institute of Technology College of Computing {wenke, dagon}@cc.gatech.edu
Introduction • We summarize recent work on botnet detection and response • One aspect of large sinkhole study • “KarstNet” project • Goal: stop botnets before they attack • Requires sensitive detection that identifies attack networks as they form.
Introduction • Significant, growing problem: botnets • Collectively, attackers are stronger • DDoS, spam-sending armies, distributed phishing, • Botnets facilitate blended attacks, and conduct lightning, mass-attacks of new exploits: “The short vulnerability-to-exploitation window makes bots particularly dangerous” -- “Emerging Cybersecurity Issues Threaten Federal Information Systems”, http://www.gao.gov/cgi-bin/getrpt?GAO-05-231
Introduction • Botnet design goals: • Robustness: no simple point of failure • Mobility: Command and Control (C&C) can migrate to other networks • Stealth: difficult to detect • Key insight: • C&C is essential to a botnet. • Without C&C, bots are just discrete, unorganized infections
“The Rallying Problem” • C&C is used to “rally” victims into a network. • If we can detect C&C, we identify the botnet • Our goal: detect botnet during its formation, before it attacks (e.g., via DDoS) • Let’s reason like an attacker, to learn how to identify C&C traffic. • We’ll compare different attacker strategies to the attacker’s three design goals: • Robustness, Mobility, Stealth
Usenet / Email Naïve First Virus • Suppose we write a virus. • We borrow from public repositories of virus source code • 10 minutes later, we’ve compiled our first VB virus. • Payload: it spreads itself by email, and prints annoying messages to the screen. • We email it with some enticing content or other social engineering ploy. • What happens? Virus VX (VX means “virus”)
Usenet / Email Naïve First Virus • The virus spreads to 10k victims (easily). • Congratulations, you’ve just graduated to the 1980s virus scene. • Let’s suppose we wanted to use the victim computers, instead of just harming them. Virus V1 V2 V3 V8 V4 V9 V5 V7 V6
Virus Usenet / Email VX (Still) Naïve Rallying • How can we find the victims? • Problem: Random victim propagation. • Simple (bad) idea: Victims e-mail their IP addresses • Problems: • Virus has to include author’s address (no stealth) • Single point of failure (not robust) • Virus has hardcoded address (not mobile, if author’s e-mail account suspended) Victim1’s ip Virus V1 Victim2’s ip V2 V3 Victim3’s ip
Naïve Rallying II • Another idea: The victims could post to usenet, and the VXer could read the posts anonymously • We’ve just reinvented the early/mid 1990s vx scene • Problem: • Somewhat robust • A few Usenet posts get dropped • Some Delays in posting cause DHCP victims to change IPs • Not stealthy • AV companies and rival VXers obtain victim information • There’s a fairly public listing of who is infected • We want packets, not Usenet posts from the victims, since these don’t usually make a lasting record.
backdoor Naïve Rallying III Virus • We use one victim as a web server, and all other contact this victim. The VXer just reads the httpd logs to identify victims. • Problems: • Not Robust: Single-point-of-failure • Not Very Stealthy: Hard-coded C&C IP VX Usenet / Email Victim1’s ip Virus V1 Victim2’s ip V2 V3 Victim3’s ip
Use an IRC network for rallying, and private (keyed) channels. This is the late 1990s VX scene Benefits: Robust IRCd hub/leaf design has no single point of failure Problems: Not very stealthy (careful binary RE can discover channel key) Not very Mobile: once all IRCd operators ban channel, bots are not mobile Rallying IV Virus VX Usenet / Email IRC Network Virus V1 V2 V3
IRC Network 2 Rallying V • Attacker uses Dynamic DNS (DDNS) • Chooses an IRC network for victims, updates record response (RR) through DDNS. • Other robust network rallying possible (e.g., P2P) • DDNS is used by most (95%+) of the botnets. • Even for those using non-IRCd rallying DDNS Update Virus VX Usenet / Email IRC Network 1 Virus V1 SYN RR for hacker.org V2 V3 DDNS DNS for hacker.org?
V4 V1 V3 V2 V5 KarstNet Overview Dynamic DNS 2: www.hackers.com?” 3’: DNStop alert. DynDNS updates CName to point to GT sinkhole www.hackers.com 10.0.0.1 (Rallying box) ! 3: 10.0.0.1 4 4’ Georgia Tech Sinkhole 1 1: propagate; “www.hackers.com” coded in malware 4’ 4 4’ Malware Author Victim Cloud
DDNS Rallying • Note general properties of hardcoded rallying (string) address: • Domain name purchases use traceable financial information. Multiple 3LDs can use DDNS service with one package deal. • Thus: financial and stealthy motives for botnet authors to “reuse” SLD with numerous 3LDs. SLD botnet1.evilhacker.org botnet2.evilhacker.org botnet3.evilhacker.org … 3LDs
DNS Rallying • Also, note DNS behavior of botnets • After boot, bots immediately resolve their C&C. • Exponential arrival of bot DNS requests, because of time zones, 9 a.m./5 p.m. schedules, etc. • Normal DNS behavior is not exponential. • Humans don’t immediately check the same server seconds after boot.
Detection Overview • Observation #1: Rates of 3LDs within and SLD are higher for botnets. • Easily detected when 3LD rates are factored into SLD rates • Observation #2: Rates of DNS requests for botnet domains is exponential. • Easily distinguished from normal DNS rate densities.
3LD/SLD Detection • We define canonical DNS rate for SLDi as: • We obtained 2-week DNS sample from DDNS provider; hand identified the dozens of botnets for ground truth.
3LD/SLD Detection Detection via simple threshold and inequality:
3LD/SLD Detection • Assumptions: • DDNS providers tend to have few 3LDs for customers • Financial disincentives for web design (changes require DNS updates) • Easier to create (HTML skills vs DNS skills) • Customers expect SLDs 3LDs Subdirectories somebusiness.com products.somebusiness.com/ orders.somebusiness.com/ somebusiness.com/products somebusiness.com/orders
Rate Detection • Most victim (home) computers are turned on/off periodically. • (Note strong diurnal pattern) • A second detection layer: • Take DNS rates for all hosts, and sort by lookups/time unit for a small (e.g., 12 hour) window • The botnet hosts have exponential “spikes” as victims rally • Normal traffic is smoother (poisson arrival) Activity (SYN rate) of large 350K+ member botnet
Rate Detection • Differentiate densities with various measures • Mahalanobis distance • K-S distance
Assumptions • DNS rates for DDNS providers differ from other networks. • These detection techniques are specific to DDNS provides. • Currently, most (95%+) of studied botnets use DDNS
Response • We’ve focused on detection, so we’ll just note response options: • Recording victim IPs (blacklist routing) • Contacting upstream ISPs • Sinkholing • DDNS provider offers RR of sinkhole IP
(Other Work) • Time permits only brief mention of other benefits: • Accurate propagation models based on actual data—a first! • Rank ordering of malware importance, based on expected propagation rates. • Design of next-generation proxypots and honeypots
Conclusion • Botnets: a significant problem • Goal: detect victim cloud prior to botnet attacks (e.g., DDoS) • Insight: botnets must use C&C • Detection: • For DDNS detection possible with 3LD/SLD adjusted rates, and sorted rate densities.