Exploiting Temporal Persistence to Detect Covert Botnet Channels

Exploiting Temporal Persistence to Detect Covert Botnet Channels Frederic Giroire (CNRS, France), Jaideep Chandrashekar, Nina Taft, Eve Schooler, and Dina Papagiannaki (Intel Research) RAID’09

Outline • Introduction • Temporal Persistence • Design and Implementation • Dataset and Evaluation • Conclusion and Comments Speaker: Li-Ming Chen

Botnet • Botnet • A botnet is a collection of compromised end-hosts • Under controlled by a bot-master • Through a command and control (C&C) channel • Used to launch various malevolent activities • DDoS, spamming, stealing privacy, etc. • Why botnets are so common and dangerous? • Low maintenance cost and easy of use (e.g., through IRC) • Non-tech criminals can buy or rent botnets •  Botnet-based underground economy Speaker: Li-Ming Chen

Botnet Detection • Traditional intrusion detection: • Misused detection • Drawback: only for known attacks, and easy to evade • Anomaly detection • Can detect activated zombie hosts • But with a delayafter a host joining a botnet to the time that is instructed to carry out a malicious task • Directions for mitigating botnet problems • 1.) prevent the recruitment • 2.) detect the covert C&C channel (focus) • 3.) detect attacks being carried out by the bots Speaker: Li-Ming Chen

Botnet Detection (related work) • Anomaly-based IRC channels detection (based on protocol/payload analysis) • BotHunter – chains together various alarms to detect a whole (or partial) botnet lifecycle(USENIX Sec.‘07) • BotSniffer – focus on detecting C&C server (NDSS’08) • Similar behaviors to the same destination (centralized botnet) • BotMiner – cluster attack traffic and normal (C&C) traffic, then perform cross clustering to identify hosts that undertake both kinds of communication (USENIX Sec.’08) Speaker: Li-Ming Chen

Objective of this Paper • Aim to detect botnet C&C communications on an endhost • Define “destination atoms” • Measure the temporal regularity (persistence) for individual destination atoms on each endhost • Identify suspicious C&C communications • Comparing to other detection techniques: • Not attempt to identify attack traffic in the traffic stream • Not attempt to correlate activities across hosts Speaker: Li-Ming Chen

Observations • C&C traffic: • Each bot needs to communicate regularly with a C&C server • And this is a common behavior across different bots • This C&C communication might be very stealthy • Avoid being detected • However without “frequent” communication to a C&C server, the bot becomes invisible to the bot-master • Still need to maintain this communication over time •  C&C communication may be low frequent but persistent Speaker: Li-Ming Chen

Observations (cont’d) • Normal communications • An endhost, on any particular day, may communicate with a large set of destination end-points • However, most of these destinations are transient • Be communicated with a few times and never again • Smaller and stable set of destinations will be visited regularly • Work related sites, news/entertainment websites, sites contacted by applications •  need to distinguish C&C traffic from these Speaker: Li-Ming Chen

Approach (how to exploit temporal persistence to detect botnet channel) • Introduce a notion called “destination atoms”, and a metric called “persistence” to capture the lightweight yet regular communication • Training: • Persistent destination atoms are added to a host’s whitelist during a training period • The whitelist requires infrequent updating (due to the persistence) • Test: • Track the persistence of new destination atoms not already whitelisted identify the C&C traffic and destination • For stealthy attacks: • Track persistence at multiple timescales concurrently Speaker: Li-Ming Chen

Destination Atoms • Destination atoms is an aggregation of destinations • Only care about the network service being connected to, not so much the actual destination IP address • E.g., the particular addresses that respond to google.com vary by location and time (but the user just want to access the google service) • Mapping: • Given (dstIP, dstPort, proto)  obtain the atom (dstService, dstPort, proto) Speaker: Li-Ming Chen

Example of Destination Atoms Destination Atoms contacted by somehost.intel.com Speaker: Li-Ming Chen

How to extract Services? (by heuristic) • 1. if the src. and dst. belong to different domains, the service name is simply the 2nd level domain name of the dst. • E.g., google.com, yahoo.com • 2. if the src. and dst. belong to the same domain, the service name is the 3rd level domain name • E.g., mail.intel.com, print.intel.com Speaker: Li-Ming Chen

How to extract Services? (by heuristic) (cont’d) • 3. utilize application level information (when higher level application semantics are available) • E.g., dst. atom for FTP service: (ftp.service.com, 21:>1024, tcp) • 4. using destination port to distinguish services on a single destination host (who provides a number of distinct services) • 5. when the addresses cannot be mapped to names, using IP address as the service name Speaker: Li-Ming Chen

Persistence (sliding) Observation Window W • W≡ [s1, s2, …, sn ] • The persistence of a destination atom d in the observation window W is defined as: • Say d is persistent if Host A - - - -> generates outgoing traffic … Measurement Window s Indication function, return 1 if si > 0, 0 if si = 0 p* is a pre-defined threshold Speaker: Li-Ming Chen

Persistence in Multiple Timescales • Botnets differ from one to another, and we cannot know a prior the frequency of C&C comm. •  need to design a method that can track persistence over several observation windows simultaneously • Timescale: • Select k overlapping timescales • And the judge of the persistence is Smallest timescale Largest timescale For each timescale, Persistence  P(j)(d) Speaker: Li-Ming Chen

Multiple Timescales - Implementation • Size of the measurement window • {s1, s2, …, s7 } = {1, 4, 8, 12, 16, 20, 24} (hour) • In preliminary analysis, 87% of connections to the same destination atom are separated by at least 1hr • Choose n = 10 Wj = n * sj • (Wmin =10, smin=1)smallest ~ (Wmax=240, smax=24)largest • Implementation k separate bitmaps !? (not necessary) sj is covered by a slot in the next higher timescale OR operation Smallest timescale (bitmap) Speaker: Li-Ming Chen  therefore, only need to construct a simple long bitmap that cover all the timescales

Compute Persistence for each smin, compute persistence for all dst. atoms bitmaps stored in DCT, indexed by individual atoms d (for each atom d) multiple timescales (there is a separate process that processes each outgoing connection; this check if the destination atom is whitelisted) bitmap length, idx for each bit (ring buffer) Speaker: Li-Ming Chen

Whitelist – Training and Detection • Training and detection stages proceed identically (almost) • Persistence of destinations is tracked and alarms raised when this crosses a specified threshold • Training: • An alarm simply results in the atom being insert into the whitelist • Detection: • Checking whitelist • Alarm is exposed to the enduser for further analysis (benign, insert into whitelist; or malicious, block connecitons) Speaker: Li-Ming Chen

End Host Traffic Traces • Collect at 157 hosts over a 5 week period (2007/1~2) • Collect all packets headers • Divide data into training and testing sets • Training set is used to determine the threshold and build the per-user whitelists • Testing data is used to assess the detection performance • FP rate and FN rate Speaker: Li-Ming Chen

Botnet Traffic Traces • Collect botnet binaries, execute on WinXP SP2 VM, and generate botnet traffics • No other IP traffic will be sent out of the VM • Hard work: binary crash, C&C server not found, only 12 binaries work! • In test dataset, overlay these botnet traffic on top of the normal traffic traces (conn./min.) Speaker: Li-Ming Chen

System Properties • For system to work well, whitelists properties: • Should be stable, changes infrequently • Smaller is better, can speed up the searching select p* = 0.6 (Total 157 users) Size is small, manageable (a user typically has few persistence dst. atoms) CDF of p(d) across all the atoms seen in training data Distribution of per host whitelist sizes computed using p* = 0.6 Speaker: Li-Ming Chen

s = (1, 4, 8, 16, 20, 24) W = 10 * s C&C Detection Overlaid bot trace data on top each of the 157 user traces Various properties of the detected botnets A botnet might use multiple timescales for different dst. atoms! Stealthy botnets Also detect non-centralized (p2p) botnet (> 0.6) Speaker: Li-Ming Chen

C&C Detection (cont’d) • Using ROC curve to compute the FP and detection rate • In an enterprise network, FP rate might be low (well behaved users); however, in real world, FP rate will raise! • Whitelist applications. e.g., BitTorrent. (Total 157 users) (avg. 5.3 benign dst. atoms per user) Knee, best threshold Small users see large alarms FP across users (p* = 0.6) ROC curve Speaker: Li-Ming Chen

Detecting Botnet Attack Traffic • Study how whitelist can boost the detection rates of more traditional volume-based anomaly detectors • Whitelists  known good destinations • Traffic going to these destinations must be “anomaly free” (can be filtered out) • Use a simple connection count detector with a 99.9%-ile threshold • After filtering, the detection rate is better (e.g., Aimbot-25, VB-666) • The benefit of filtering is apparent when the botnet traffic volumes are low to moderate Speaker: Li-Ming Chen

Conclusions • Introduce “persistence” as a temporal measure of regularity in connection to “destination atoms” • Persistence does not require any protocol semantics or to look inside payloads to detect the malware • Describe a method that builds whitelists of known good destination atoms • In order to isolate persistent destinations (likely C&C channels) • Evaluation shows that the proposed method successfully identified C&C destinations in every botnet instance • The proposed method can also boost the traditional detection algorithm by filtering traffic Speaker: Li-Ming Chen

My Comments • Using multi-resolution approach to explore the temporal behavior of a bot • Connects to C&C server(s) periodically • Can cooperate with other botnet detection techniques (not host-based) • In detection, alarm raise does not imply finding the attack • Require to further analyze the destination and the traffic • The limitation of using multi-resolution approach? Speaker: Li-Ming Chen

Exploiting Temporal Persistence to Detect Covert Botnet Channels

Exploiting Temporal Persistence to Detect Covert Botnet Channels

Presentation Transcript

Covert Channels Non-interference and Policy Composition

Embedding Covert Channels into TCP/IP

Covert Channels

Analysis and Detection of Network Covert Channels

Covert Data Channels

Covert Channels

PHY Covert Channels: Can you see the Idles?

Covert Channels

Covert Channels

Active Botnet Probing to Identify Obscure Command and Control Channels

Covert Channels

COVERT CHANNELS Ravi Sandhu

Covert Channels in IPv6

Covert Channels A Primer for Security Professionals

Covert Channels and Anonymizing Networks

Covert channels detection in protocols using scenarios

Covert Channels, Analysis and Mitigation

Information Flow and Covert Channels

Network Covert Channels

Covert Channels in IPv6