190 likes | 209 Views
This paper presents Network DVR, a programmable framework for application-aware trace collection. It enables users to program packet recording based on their own regular expression rules, allowing for selective recording of content and reducing false positives.
E N D
Network DVR:A Programmable Framework forApplication-Aware Trace Collection Chia-Wei Chang*, Alex Gerber+, Bill Lin*,Shubho Sen+, Oliver Spatscheck+ *University of California, San Diego+AT&T Labs-ResearchApril 9, 2010
Introduction • Network traces are essential for wide range ofnetwork applications: • e.g., traffic analysis, network measurement, performance monitoring, security analysis … • Existing capture tools typically focus on recording packets based on simple packet header rules(e.g., port numbers): • Often only capture packet headers, not content • However, for certain applications, operators would like to record content selectively based on application requirements • Recording all packets is not practical PAM 2010 Zurich, Switzerland
User-Agent:[^\n\r]+WebshotsNetClient this means 1 or more of any charactersexcept newline or carriage return Motivated Application Example • Consider Snort for intrusion detection • Snort can identify when an intrusion has occurred and raise the alarm • e.g., for the following truncated Snort FTP policy rule • Suppose we would like to record the character sequence that raised the alarm, what’s the challenge? PAM 2010 Zurich, Switzerland
Challenges • “False positive” problem • Naïve approach: Record from the beginning • Most initial partial matches eventually lead to “false positives”, consuming resources unnecessarily • may match User-Agent:in the beginning, but eventually fail to match WebshotsNetClient • Packet recording at line-speed is very resource intensive • Need better way to start/stop recording • Different applications have different recording requirements • Need flexible programmability to capture application-specific “Content-of-Interest” Eg: User-Agent:[^\n\r]+WebshotsNetClient PAM 2010 Zurich, Switzerland
Impact of False Positive problem • Example experiment • Real data collected on a trunk of four Gigabit Ethernet link(4 Gb/s aggregate) at large enterprise gateway over60 minute period • Consider all FTP regular expression rules from Snort 2007 Amount of Memory Copies Using only IP-address and TCP-port filtering 900 MB Eager recording(begin immediately when a rule starts to match) 38 MB Actual relevant recording 31 KB Eager vs. Actual recording ratio > 1200xMost initial recordings lead to false positives PAM 2010 Zurich, Switzerland
Our Solution: Network DVR • Loosely analogous to concept of “Digital Video Recorder” (DVR) for TV recording where users can program what content to record … • Network DVR enables users to program the packet recorder using their own regular expression rules • e.g. when to start recording, when to stop … • Network DVR uses a concept of “Triggered-Recording” to minimize the impact of false positives PAM 2010 Zurich, Switzerland
Triggered-Recording Example • Suppose we want to record all domain names corresponding to http requests to educational URLs • i.e., those of form http://.*\.edu, for example for tracking the top 100 most popular educational web sites • Basic idea of Triggered-Recording • Define 3 classes of matching rules: Start, Abort, and Final. • Start rule: e.g., http:// • Only start recording after some “start rule” has been matched – avoids false-starts when “h” or “ht” has been encountered, etc • Abort rule: e.g., \.com and \.org • Stop recording if some “abort rule” has been encountered – enables early garbage collection • Final rule: e.g., \.edu • Stop recording and flush recorded character sequence to disk PAM 2010 Zurich, Switzerland
Triggered-Recording Example (cont’d) • Suppose we also want to record character sequences that match the following signature: • DEL[^\r\n]*ATT • Corresponding rules • Start rule: DEL • Abort rule: \r and \n • Final rule: ATT PAM 2010 Zurich, Switzerland
Features to design Network DVR • Deterministic Finite Automaton (DFA) used to match incoming traffic against the 3 sets of trigger rules • Accept states correspond matched rules • Extend basic DFA to control the packet recorder and to “remember” the state of the packet recorder (e.g. whether the recorder is currently recording) • Consider matchings for a flow across “packet boundary” (need to maintain flow table and flow states in DFA) PAM 2010 Zurich, Switzerland
1. Trigger Rules Creation • Start, abort, and final trigger rules are referred to as a, b, and g rules and are grouped into 3 sets Wa, Wb, and Wg, respectively Application-Specific Signatures Design Trigger Rules Corresponding Rulesets a1 = http:// b11 = \.com b12 = \.org g1 = \.edu a2 = Del b21 = \r b22 = \n g2 = ATT PAM 2010 Zurich, Switzerland
2. Construct DFA to Implement Matching • Each accept state corresponds to 1 or more matched rules from the 3 rulesets Compiled DFA from any state Corresponding Rulesets h h t t p : / / 1 2 3 4 5 6 7: a1 from any state 9 d 10 u 11: g1 e \. remaining transitions \. 8 c 12 o 13 m 14: b11 0 o 15 r 16 g 17: b12 from any state D from any state D 18 e 19 l 20: a2 A from any state A 21 T 22 T 23: g2 \r from any state \r 24: b21 \n \n 25: b22 PAM 2010 Zurich, Switzerland
3. Trigger Recording Behavior For each signatureMAi ={αi βi1 βij γi1 γik}, there is a corresponding variable vi that gets set where its corresponding start rule has been matched from any state h Matching Index: MA1 h t t p : / / a1 1 2 3 4 5 6 set v1; start recording g1 d u if (v1): reset v1; flush recording from any state 9 10 e \. remaining transitions if (v1): reset v1; abort recording \. b11 c o m 8 12 13 if (v1): reset v1; abort recording Upon encounter an abort or final trigger rule, we check if there is an active recording for this rule by testing vi. If it is set, the recording is either aborted or flushed, respectively and reset vi. If it is not, ignore the matched abort or final trigger rule 0 o b12 r g 15 16 from any state Matching Index: MA2 D D e l a2 18 19 set v2; start recording if (v2): reset v2; flush recording A T T g2 21 22 if (v2): reset v2; abort recording A from any state \r if (v2): reset v2; abort recording b21 from any state \r DFA \n b22 from any state \n PAM 2010 Zurich, Switzerland
3. Trigger Recording Issues • How to efficiently examine the trigger condition? • To test if vi is set for flow f, we can perform a hash lookup on the key f:vi, where the key is constructed by combining the flow ID f and the variable name vi. • We can perform a hash insert (or lookup-then-insert)/hash delete with the key f:vi to set/delete vi for flow f. • Multiple recording requests may be triggered? • Worst-case bound on memory bandwidth/processing time is O(N), where N = total number of signatures. • Network DVR uses single aggregated recording string for each flow to guarantee always one memory copy for each incoming symbol • By logging the recording-begin/end memory positions of each valid matching result in the aggregated recording string, the system can output all recorded matching strings for each application-specific signature. PAM 2010 Zurich, Switzerland
4. Memory Management • Need constant-time memory operations Application-Specific Signatures Flow Table Memory Allocation Free List Fid 1 DFAstate Null c g Fid 5 DFAstate n Null u Fid 108 DFAstate o n c o . s g c d l o . : Valid recording, send to output queue for flushing to disk m : Recycle memory cells to free list e Abort recording, recycle memory cells d u PAM 2010 Zurich, Switzerland
Evaluation Setup • Use real data traces from a large ISP • Traces collected on a trunk with four Gigabit Ethernet links at a large enterprise Internet gateway by using IP-filtering • Partitioned into 10 datasets of 60 min intervals • Each dataset has approx. 3,500 flows • For each dataset, with the given application-specific signatures, we replayed the complete trace and calculated the number of memory copies that Network DVR needed vs. an eager approach in which recording starts when the first character of a regular expression is matched • Use practical signatures from SNORT • Use Snort 2007 ftp signatures (58 regex) to evaluate the amount of unnecessary memory copies that Network DVR can reduce by using the proposed triggered-recording concept • Use public domain DFA implementation provided by Becchi and Crowley[Conext 2008] to serve as the matching module in Network DVR PAM 2010 Zurich, Switzerland
Evaluation Results • “data” shows the size of the total incoming symbols • “eager” begins copying symbols to memory when the first character of a regular expression is matched • “netDVR” uses our triggered-recording approach • “actual” shows the memory needed to record the details of matching results (SNORT signatures) • Comparison on Memory Copy Times PAM 2010 Zurich, Switzerland
Evaluation Results (cont’d) • Comparison on Memory Copy Times Memory copy times (log scale) Reduction=eager/netDVR 500-800X Overhead=netDVR/actual only 1.48-1.6X Data Sets PAM 2010 Zurich, Switzerland
Conclusions • Proposed Network DVR as a programmable application-aware packet recording system • Proposed Trigger-Recording concept to minimize the impact of false positive recordings (partial recordings that will eventually fail) • Experimental results on real data sets from large enterprise gateway demonstrate 500-800x reduction in memory copies PAM 2010 Zurich, Switzerland
Thank You PAM 2010 Zurich, Switzerland