310 likes | 483 Views
Building a Time Machine for Efficient Recording and Retrieval of High-Volume Network Traffic. Stefan Kornexl 1 , Vern Paxson 2 , Holger Dreger 1 , Anja Feldmann 1 , Robin Sommer 1 1 TU M ü nchen, 2 ICSI/LBNL Internet Measurement Conference (IMC) 2005. Reference.
E N D
Building a Time Machine for Efficient Recording and Retrieval of High-Volume Network Traffic Stefan Kornexl1, Vern Paxson2, Holger Dreger1, Anja Feldmann1, Robin Sommer1 1TU München, 2ICSI/LBNL Internet Measurement Conference (IMC) 2005
Reference • Stenfan Kornel, Vern Paxson, Holger Dreger, Anja Feldmann, Robin Sommer, “Building a Time Machine for Efficient Recording and Retrieval of High-Volume Network Traffic,” 5th ACM IMC 2005. • “High-Performance Packet Recording for Network Intrusion Detection,” master thesis by Stefan Kornexl, 2005. • Time Machine webpage • http://www.net.t-labs.tu-berlin.de/research/tm/ Speaker: Li-Ming Chen
Outline • Motivation and Goals • Feasibility Study (trace-driven simulation) • System Architecture • Performance Evaluation • Conclusion and Comments Speaker: Li-Ming Chen
Motivation • The availability of packet recording is considered a big benefit for network security monitoring • Security forensics • Determining how an attacker compromised a given host • Network trouble-shooting • Inspecting the precursors to a fault after the fault • Event correlation • NIDS could analyze past events that are not considered “interesting” until more recently seen traffic hinted at their relevance Speaker: Li-Ming Chen
Problems • Looking at raw packets (not only headers but full contents) • Storage constrains • In many operational environments, it’s infeasible to capture the entire traffic stream due to the enormous volume of the traffic • Problems for data filtering • Hard to decide beforehand what context will turn out to be relevant retrospectively to investigate incidents • Filtering still becomes technically problematic in high speed network (n TB per day) • Data retrieval is like finding needle in a haystack • It’s time-consuming and cumbersome Speaker: Li-Ming Chen
Related Work • Brute-force bulk-recording • Only in low volume environments • Record those packets that trigger alerts • Do not support retrospective analysis of a problematic host’s earlier activity • Sampling – might loose important evidence • Data abstraction – provide less information Speaker: Li-Ming Chen
Objection • Design and implement a packet recording system “Time Machine” • Use dynamic packet filtering and buffering to enable effective recording of large traffic stream • Nearly complete historic data for several days • Allowing to conveniently “travel back in time” • Application: • E.g., a forensic tool – to extract detailed past information about unusual activities once they are detected Speaker: Li-Ming Chen
The Approach • Observation (key insight): • “Heavy-tailed” distribution in network traffic • Most network connections are quite short • Only a small number of large connections accounting for the bulk of the total volume • Compromising is at the beginning of most attacks • For forensics and trouble-shooting applications the beginning of a large connection contains the most significant information Speaker: Li-Ming Chen
The Approach (cont’d) • Exploit the “heavy-tailed” nature to partition the traffic stream into a small subset of high interest vs. a large remainder of low interest • Then record the small subset and discard the rest • Cutoff limit, N: • For every connection, it buffers up to the first N bytes of traffic • Greatly reduce the traffic we must buffer • Retain full context for small connections and the beginning for large connections Speaker: Li-Ming Chen
Design Goals for the Time Machine • Provide raw packet data • Buffer traffic comprehensively • Prioritize traffic • Automated resource management • Efficient and flexible retrieval • Suitable for high-volume environments using commodity hardware Speaker: Li-Ming Chen
Outline • Motivation and Goals • Feasibility Study (trace-driven simulation) • System Architecture • Performance Evaluation • Conclusion and Comments Speaker: Li-Ming Chen
Environments • MWN • Munich Scientific Research Network in Munich, Germany • About 50,000 hosts, 2 TB/day • 15-20% FTP traffic • 350 Mbps (68 Kpps) at busy-hour • LBNL • Lawrence Berkeley National Laboratory in California, USA • About 9,000 hosts & 4,00 users • 320 Mbps (37 Kpps) at busy-hour • NERSC • National Energy Research Scientific Computing Center • About 600 hosts & 2,000 users (dominated by large transfers) • 260 Mbps (43 Kpps) at busy hour Speaker: Li-Ming Chen
Datasets • Connection-level summaries (1 week) collected by Bro NIDS • MWN – 355 million connections (from 2004/10/18) • LBNL – 22 million connections (from 2005/2/7) • NERSC – 4 million connections (from 2005/4/29) • These logs capture the nature of their environments but with a relatively low volume compared to full packet-level data • Use packet-buffer model to simulate packet-level communication and evaluate the memory requirements of a Time Machine Speaker: Li-Ming Chen
Heavy-tailed Distribution and the Cutoff ≈ 90% connections (record) Cutoff = 20 KB Their bytes: (discard) NERSC 99.86% LNBL 96% MWN 87% 12% 14% 15% ≈ 10% connections (discard) (log-log scaled) Speaker: Li-Ming Chen
Evaluate the Memory Requirements • Eviction time, Te : • How long the buffer stores each connection’s data • (The goal) aim for a value of Te on the order of days rather than minutes • Changing the cutoff N and the eviction time Te to evaluate the efficiency (feasibility) of a Time Machine • Results: using a cutoff of 10-20 KB, buffering several days of traffic is practical Speaker: Li-Ming Chen
Required Memory for LBNL Increase the duration of data availability by a factor of 32 (3h vs. 4d) Stop to increase after 4 days, since the constrain of eviction time Te 68 GB 64 GB 5th day Speaker: Li-Ming Chen
Required Memory for NERSC NERSC has large proportion of high-volume traffic (14% connections -> 99.86% bytes) • Without a cutoff, the • volume is spiky • Te only is helpless • for volume because • of the intermittent • bursts of traffic 344 GB 14.9 GB Speaker: Li-Ming Chen
Required Memory for MWN MWN has lower fraction of bytes in the larger connections (15% connections -> 87% bytes) The gain from the cutoff is not quite as large, likely due to the larger fraction of HTTP traffic Speaker: Li-Ming Chen
Outline • Motivation and Goals • Feasibility Study (via trace-driven simulation) • System Architecture • Performance Evaluation • Conclusion and Comments Speaker: Li-Ming Chen
Time Machine System Architecture 4 Main Functions : 2. migrating the buffered packets to disk and managing the associated storage 1. buffering traffic using a cutoff 4. enabling customization Speaker: Li-Ming Chen 3. providing flexible retrieval of subsets of the packets
Two-thread Architecture • Separates user interaction from recording to ensure that packet capture has higher priority than packet retrieval Speaker: Li-Ming Chen
Packet Capture • The capture unit • Receive packets from network tap and passes them on to the classification unit • Use libpcap packet capture library to collect and store each packet’s full content and capture timestamp • libpcap can specify a kernel-level BPF (BSD Packet Filter) capture filter to discard “uninteresting” traffic as early as possible Speaker: Li-Ming Chen
Classification • The classification unit • Divide the incoming packet stream into user-defined classes • Assign packets to different storage containers based on their classes • Responsible for monitoring the cutoff with the help of the connection tracking unit • Connection tracking unit keeps per connection statistics and checks if the connection the packet belongs to has exceeded its cutoff threshold • An example of the “telnet” class: name BPF filter (rule) priority cutoff Speaker: Li-Ming Chen memory and disk buffer size
Storage Containers • The architecture supports customization by splitting the overall storage into several storage containers • Each storage container is responsible for storing a subset of packets within the resources (memory/disk) • According to the user defined classes • RAM and disk buffers are implemented as two ring buffers • Packets evicted from the RAM buffer are migrated to the disk buffer • And eventually be deleted Speaker: Li-Ming Chen
Indexing • For efficient retrieval • Use an index across all packets stored in all storage containers • Each index manages a list of time intervals for every unique key value • Update [Tstart, Tend] for each key (each incoming packets) • The time intervals provide information on whether packets with that key value are available in a given storage container and at what starting timestamp • Just scan linearly through the intervals it gets from the index • Multiple indexes • Support any number of indexes over an arbitrary set of protocol header fields Speaker: Li-Ming Chen
Query Processing • Provides a flexible language to express queries for subsets of the packets • Each query consists of a logical combination of time ranges, keys, and an optional BPF filter • Check index, get the time range of the query. • Locate the time ranges in the storage containers using binary search • Scanning all packets in the identified time ranges and checking if they match the query • Writing the results to a tcpdump trace file on disk Speaker: Li-Ming Chen
User Interface • Allows the user to configure the recording parameters • Classification rules, cutoff, storage management, indexing Δt, etc. • Issues queries to the query processing unit to retrieve subsets of the recorded packets Speaker: Li-Ming Chen
Outline • Motivation and Goals • Feasibility Study (via trace-driven simulation) • System Architecture • Performance Evaluation • Conclusion and Comments Speaker: Li-Ming Chen
Evaluation in LBNL • Configuration: • 3 classes, each with • a 20KB cutoff: • TCP 90GB • UDP 30GB • Others 10GB • Retention: • The distance back in time to which we • can travel at any particular moment • Increases after the Time Machine starts • until the disk buffers have filled • Correlates with the incoming bandwidth for each class • and its variations due to diurnal and weekly effects Speaker: Li-Ming Chen
Evaluation • In LBNL • 98% of the traffic gets discarded • The remainder imposes an a average (maximum) rate of 300 KB/s (2.6 MB/s) • Over the 2 weeks of operation libpcap reported only 0.016% of all packets dropped • In MWN • 85% of the traffic gets discarded • Average (maximum) rate of 3.5 MB/s (13.9 MB/s) • larger volume of HTTP traffic • Issues: need to more aggressively exploit the classification and cutoff mechanisms to appropriately manage the large fraction of HTTP traffic Speaker: Li-Ming Chen
Conclusion • A concept of a Time Machine for efficient network packet recording and retrieval is proposed • Relies on the “heavy-tailed” nature of network traffic • Record most connections in their entirety and skip the bulk of the total volume • Time Machine • Can buffer several days of raw high-volume traffic using commodity hardware • Provides an efficient query interface • Automatically manages its available strorage • Using a trace-driven simulation and real experience to demonstrate the effectiveness Speaker: Li-Ming Chen