1 / 19

Network DVR: A Programmable Framework for Application-Aware Trace Collection

Network DVR: A Programmable Framework for Application-Aware Trace Collection. Chia-Wei Chang * , Alex Gerber + , Bill Lin*, Shubho Sen + , Oliver Spatscheck +. * University of California, San Diego + AT&T Labs-Research April 9, 2010. Introduction.

ziv
Download Presentation

Network DVR: A Programmable Framework for Application-Aware Trace Collection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Network DVR:A Programmable Framework forApplication-Aware Trace Collection Chia-Wei Chang*, Alex Gerber+, Bill Lin*,Shubho Sen+, Oliver Spatscheck+ *University of California, San Diego+AT&T Labs-ResearchApril 9, 2010

  2. Introduction • Network traces are essential for wide range ofnetwork applications: • e.g., traffic analysis, network measurement, performance monitoring, security analysis … • Existing capture tools typically focus on recording packets based on simple packet header rules(e.g., port numbers): • Often only capture packet headers, not content • However, for certain applications, operators would like to record content selectively based on application requirements • Recording all packets is not practical PAM 2010 Zurich, Switzerland

  3. User-Agent:[^\n\r]+WebshotsNetClient this means 1 or more of any charactersexcept newline or carriage return Motivated Application Example • Consider Snort for intrusion detection • Snort can identify when an intrusion has occurred and raise the alarm • e.g., for the following truncated Snort FTP policy rule • Suppose we would like to record the character sequence that raised the alarm, what’s the challenge? PAM 2010 Zurich, Switzerland

  4. Challenges • “False positive” problem • Naïve approach: Record from the beginning • Most initial partial matches eventually lead to “false positives”, consuming resources unnecessarily • may match User-Agent:in the beginning, but eventually fail to match WebshotsNetClient • Packet recording at line-speed is very resource intensive • Need better way to start/stop recording • Different applications have different recording requirements • Need flexible programmability to capture application-specific “Content-of-Interest” Eg: User-Agent:[^\n\r]+WebshotsNetClient PAM 2010 Zurich, Switzerland

  5. Impact of False Positive problem • Example experiment • Real data collected on a trunk of four Gigabit Ethernet link(4 Gb/s aggregate) at large enterprise gateway over60 minute period • Consider all FTP regular expression rules from Snort 2007 Amount of Memory Copies Using only IP-address and TCP-port filtering 900 MB Eager recording(begin immediately when a rule starts to match) 38 MB Actual relevant recording 31 KB Eager vs. Actual recording ratio > 1200xMost initial recordings lead to false positives PAM 2010 Zurich, Switzerland

  6. Our Solution: Network DVR • Loosely analogous to concept of “Digital Video Recorder” (DVR) for TV recording where users can program what content to record … • Network DVR enables users to program the packet recorder using their own regular expression rules • e.g. when to start recording, when to stop … • Network DVR uses a concept of “Triggered-Recording” to minimize the impact of false positives PAM 2010 Zurich, Switzerland

  7. Triggered-Recording Example • Suppose we want to record all domain names corresponding to http requests to educational URLs • i.e., those of form http://.*\.edu, for example for tracking the top 100 most popular educational web sites • Basic idea of Triggered-Recording • Define 3 classes of matching rules: Start, Abort, and Final. • Start rule: e.g., http:// • Only start recording after some “start rule” has been matched – avoids false-starts when “h” or “ht” has been encountered, etc • Abort rule: e.g., \.com and \.org • Stop recording if some “abort rule” has been encountered – enables early garbage collection • Final rule: e.g., \.edu • Stop recording and flush recorded character sequence to disk PAM 2010 Zurich, Switzerland

  8. Triggered-Recording Example (cont’d) • Suppose we also want to record character sequences that match the following signature: • DEL[^\r\n]*ATT • Corresponding rules • Start rule: DEL • Abort rule: \r and \n • Final rule: ATT PAM 2010 Zurich, Switzerland

  9. Features to design Network DVR • Deterministic Finite Automaton (DFA) used to match incoming traffic against the 3 sets of trigger rules • Accept states correspond matched rules • Extend basic DFA to control the packet recorder and to “remember” the state of the packet recorder (e.g. whether the recorder is currently recording) • Consider matchings for a flow across “packet boundary” (need to maintain flow table and flow states in DFA) PAM 2010 Zurich, Switzerland

  10. 1. Trigger Rules Creation • Start, abort, and final trigger rules are referred to as a, b, and g rules and are grouped into 3 sets Wa, Wb, and Wg, respectively Application-Specific Signatures Design Trigger Rules Corresponding Rulesets a1 = http:// b11 = \.com b12 = \.org g1 = \.edu a2 = Del b21 = \r b22 = \n g2 = ATT PAM 2010 Zurich, Switzerland

  11. 2. Construct DFA to Implement Matching • Each accept state corresponds to 1 or more matched rules from the 3 rulesets Compiled DFA from any state Corresponding Rulesets h h t t p : / / 1 2 3 4 5 6 7: a1 from any state 9 d 10 u 11: g1 e \. remaining transitions \. 8 c 12 o 13 m 14: b11 0 o 15 r 16 g 17: b12 from any state D from any state D 18 e 19 l 20: a2 A from any state A 21 T 22 T 23: g2 \r from any state \r 24: b21 \n \n 25: b22 PAM 2010 Zurich, Switzerland

  12. 3. Trigger Recording Behavior For each signatureMAi ={αi βi1 βij γi1 γik}, there is a corresponding variable vi that gets set where its corresponding start rule has been matched from any state h Matching Index: MA1 h t t p : / / a1 1 2 3 4 5 6 set v1; start recording g1 d u if (v1): reset v1; flush recording from any state 9 10 e \. remaining transitions if (v1): reset v1; abort recording \. b11 c o m 8 12 13 if (v1): reset v1; abort recording Upon encounter an abort or final trigger rule, we check if there is an active recording for this rule by testing vi. If it is set, the recording is either aborted or flushed, respectively and reset vi. If it is not, ignore the matched abort or final trigger rule 0 o b12 r g 15 16 from any state Matching Index: MA2 D D e l a2 18 19 set v2; start recording if (v2): reset v2; flush recording A T T g2 21 22 if (v2): reset v2; abort recording A from any state \r if (v2): reset v2; abort recording b21 from any state \r DFA \n b22 from any state \n PAM 2010 Zurich, Switzerland

  13. 3. Trigger Recording Issues • How to efficiently examine the trigger condition? • To test if vi is set for flow f, we can perform a hash lookup on the key f:vi, where the key is constructed by combining the flow ID f and the variable name vi. • We can perform a hash insert (or lookup-then-insert)/hash delete with the key f:vi to set/delete vi for flow f. • Multiple recording requests may be triggered? • Worst-case bound on memory bandwidth/processing time is O(N), where N = total number of signatures. • Network DVR uses single aggregated recording string for each flow to guarantee always one memory copy for each incoming symbol • By logging the recording-begin/end memory positions of each valid matching result in the aggregated recording string, the system can output all recorded matching strings for each application-specific signature. PAM 2010 Zurich, Switzerland

  14. 4. Memory Management • Need constant-time memory operations Application-Specific Signatures Flow Table Memory Allocation Free List Fid 1 DFAstate Null c g Fid 5 DFAstate n Null u Fid 108 DFAstate o n c o . s g c d l o . : Valid recording, send to output queue for flushing to disk m : Recycle memory cells to free list e Abort recording, recycle memory cells d u PAM 2010 Zurich, Switzerland

  15. Evaluation Setup • Use real data traces from a large ISP • Traces collected on a trunk with four Gigabit Ethernet links at a large enterprise Internet gateway by using IP-filtering • Partitioned into 10 datasets of 60 min intervals • Each dataset has approx. 3,500 flows • For each dataset, with the given application-specific signatures, we replayed the complete trace and calculated the number of memory copies that Network DVR needed vs. an eager approach in which recording starts when the first character of a regular expression is matched • Use practical signatures from SNORT • Use Snort 2007 ftp signatures (58 regex) to evaluate the amount of unnecessary memory copies that Network DVR can reduce by using the proposed triggered-recording concept • Use public domain DFA implementation provided by Becchi and Crowley[Conext 2008] to serve as the matching module in Network DVR PAM 2010 Zurich, Switzerland

  16. Evaluation Results • “data” shows the size of the total incoming symbols • “eager” begins copying symbols to memory when the first character of a regular expression is matched • “netDVR” uses our triggered-recording approach • “actual” shows the memory needed to record the details of matching results (SNORT signatures) • Comparison on Memory Copy Times PAM 2010 Zurich, Switzerland

  17. Evaluation Results (cont’d) • Comparison on Memory Copy Times Memory copy times (log scale) Reduction=eager/netDVR 500-800X Overhead=netDVR/actual only 1.48-1.6X Data Sets PAM 2010 Zurich, Switzerland

  18. Conclusions • Proposed Network DVR as a programmable application-aware packet recording system • Proposed Trigger-Recording concept to minimize the impact of false positive recordings (partial recordings that will eventually fail) • Experimental results on real data sets from large enterprise gateway demonstrate 500-800x reduction in memory copies PAM 2010 Zurich, Switzerland

  19. Thank You PAM 2010 Zurich, Switzerland

More Related