400 likes | 415 Views
This research paper explores the various aspects of botnets, focusing on IRC-based botnets. It covers data collection methodologies, analysis results, and related work, providing a comprehensive understanding of the botnet phenomenon.
E N D
A Multifaceted Approach to Understand the Botnet Phenomenon Published: Internet Measurement Conference (IMC) 2006 Presented by Wei-Cheng Xiao
Outline • Introduction • Overview of IRC-based botnets • Data collection methodology • Analysis results • Related work • Conclusion
Introduction • Botnet: a network of infected hosts, called bots, that are controlled by botmasters • The characteristic of botnets • The command and control (C&C) channel • Communication mechanisms • IRC (the majority, easy to distribute) • P2P • HTTP
Why choosing IRC • Supports several forms of communication • Point-to-point, point-to-multipoint • Supports several forms of data dissemination • Provide open-source implemenations
Motivation and Goals • Motivation • There are increases in botnet activity, but little behavior is known. • Goals • Getting better understanding of botnets, including • the prevalence of botnet activity • the botnet subspecies diversity • the evolution of a botnet
Contributions • The development of a multifaceted infrastructure to capture and concurrently track multiple botnets in the wild • A comprehensive analysis of measurements reflecting several important structural and behavioral aspects of botnets
Outline • Introduction • Overview of IRC-based botnets • Data collection methodology • Analysis results • Related work • Conclusion
Step 1: Exploit • Exploit software vulnerability of victim hosts • by worms or malicious email attachments
Step 2: Download Bot Binary • Execute a shellcode to download bot binary from a specific location and install it
Step 3: DNS Lookup (optional) • Resolve the domain name of the IRC server coded in the binary • Avoid server unavailability due to IP blocking
Step 4: Join • Join the IRC server and C&C channel listed in the binary • 3 types of authentications • Bots authenticate to join the server using passwords in the binary • Bots authenticate to join the C&C channel using passwords in the binary • Botmasters authenticate to the bot population to send commands
Step 5: Parse and Execute Commands • Parse commands from the channel topic and execute them • The topic contains default commands for all bots
Outline • Introduction • Overview of IRC-based botnets • Data collection methodology • Analysis results • Related work • Conclusion
The Three Main Phases • Malware collection • Goal: collect bot binaries • Binary analysis via gray-box testing • Goal: analyze the binaries • Longitudinal tracking of botnets • Goal: track real botnets using the analysis results
Phase 1: Malware Collection Darknet: an allocated but unused portion of the IP address space
Malware Collection • Environment setup • There are 14 nodes distributed in the PlantLab testbed. • These nodes have access to the darknet, whose IP space is located in 10 different class A networks. • Nepenthes • Mimics replies generated by vulnerable services to get shellcodes • Pass URLs in the shellcodes to the download station to fetch bot binaries (why?) • Honeynet • Used to handle cases where Nepenthes failed • Running unpatched Windows XP on VM • VLAN
Gateway • Route darknet traffic to Nepenthes and honeypots • half to Nepenthes, half to honeypots • Rotate routing among 8 class-C networks in the darknet • Use NAT to keep # of honeypots small • Act as a firewall to prevent honeypots from outgoing attack and cross infections (VLAN) • Detect and manage IRC connections
Phase 2: Binary analysis (graybox)
Binary Analysis • Environment setup • A sink (IRC server) monitors all network traffic. • A client, which is a VM with clean Windows XP installed and binary executed, is connected to the sink. • Two steps • Creating network fingerprints • Extracting IRC-related features
The Two Steps • Creating network fingerprints (network level) • fnet = {DNS, IPs, Ports, Scan} • DNS: targets of any DNS requests • IPs: destination IP addresses • Ports: contacted ports on the server side • Scan: whether or not the IP scanning behavior is detected • Extracting IRC-related features (application level) • When an IRC session is detected, an IRC-fingerprint is created: • firc = {PASS, NICK, USER, MODE, JOIN}. • fnet and firc provide enough information to join a botnet in the wild.
Dialect • Dialect: the syntax of botmasters’ commands and their responses • Learning a botnet’s dialect is required for mimicking actual bot behavior. • An IRC query engine plays the role of botmaster. • Commands come from • those observed in honeypots • source codes of public known bots • The output of the querying process becomes the template.
IRC Tracker (Drone): • An IRC clients who can join a real-world IRC channel. • A drone is given firc and the template. • Automatically answer queries based on the template • Pretend to be a dutiful bot • Must be intelligent enough • Mimicry improvement • Randomly join and leave • Change external IP
DNS Tracking • Most bots find out IRC servers via DNS queries. • Probe about 800,000 real-world DNS servers • Query domain names of the IRC servers • A cache hits implies one or more bots • Shortcomings • Not all DNS server are probed. • # of hits provides only the lower bound of # of bots • Still useful when the broadcast feature in a botnet is turned off.
Outline • Introduction • Overview of IRC-based botnets • Data collection methodology • Analysis results • Related work • Conclusion
Data collected • Started from Feb. 1st, 2006, including • Traffic traces over the span of 3 months • IRC logs over the span of 3 months, covering data from more than 100 botnet channels • Results of DNS cache hits from tracking 65 IRC servers on 800,000 DNS servers for more than 45 days
Botnet Traffic Share • 27% of SYNs are from known botnet spreaders. • 76% of SYNs direct to target ports. • The two curves reveal similar traffic pattern. • This is a low-bound estimate.
Botnet Prevalence: A Global Look • About 85,000 DNS servers are involved in at least one botnet activity.
Botnet Spreading Patterns • Two types of botnet: • Type-I: fixed scanning algorithm • Type-II: variable scanning algorithm • Out of 192 IRC bots, 34 are Type-I. Summery of Type-II scanning practice
Predominant Botnet Structures • Single IRC server (70%) • Prevalent among small botnets • Multiple IRC servers, bridged botnet (30%) • 25% of which are public known servers • A botmaster controls multiple botnets • Some botnets migrate
Effective Botnet Sizes andBotnet Lifetime • Effective size: the # of online bots • The observed effective size was much smaller than the footprint. • Bots usually stay connected for only 25 minutes. • May be due to client inavailability • More likely, botmasters ask them to leave. • Botnets, however, have long life time • 84% IRC servers were still up at the end of study.
Outline • Introduction • Overview of IRC-based botnets • Data collection methodology • Analysis results • Related work • Conclusion
Related Work • Botnet Tracking: Exploring a Root-Cause Methodology to Prevent DoS Attacks. ESORICS, 2005 • Introduces the idea of using honeypots and active responders to analyze the botnet behavior • Scalability, Fidelity, and Containment in the Potemkin Virtual Honeyfarm. ACM SIGOPS, 2005 • A very useful tool for botnet detection, but not appropriate for long term botnet tracking
Outline • Introduction • Overview of IRC-based botnets • Data collection methodology • Analysis results • Related work • Conclusion
Conclusion • A multifaceted approach is proposed to understand botnet phenomenon. • The results show that botnet is a major contributor to the unwanted network traffic. • The scanning and pattern of botnets is quite different from that of autonomous malware. • The effective size of botnets are much smaller than that of fingerprints.