220 likes | 421 Views
Swift: A Fast Dynamic Packet Filter. Zhenyu Wu, Mengjun Xie, Haining Wang The College of William and Mary Presenter: Zhenyu Wu. Presentation Outline. Background The Problems Previous Work Our Proposed Solution: Swift Design & Implementation Evaluation Conclusion. Background.
E N D
Swift: A Fast Dynamic Packet Filter Zhenyu Wu, Mengjun Xie, Haining Wang The College of William and Mary Presenter: Zhenyu Wu
Presentation Outline • Background • The Problems • Previous Work • Our Proposed Solution: Swift • Design & Implementation • Evaluation • Conclusion 2
Background • What is a packet filter? • A kernel facility classifies and conveys network packets according to criteria specified by user application, bypassing the normal network stack. • Why need it? • Networking protocol / application debugging • Network security monitoring 3
The pioneer, CSPF Characteristics inherited by all subsequent works Kernel residing In-place filtering User Space Kernel Background P1 PacketFilter 4
Background • The milestone, BPF • Novel techniques: • Control Flow Graph filtering • Register Pseudo-machine • High performance; Cross-platform compatibility;Strong user library support • Remain the most widely used packet filter nearly 15 years after its birth 5
Login Other commands RETR somefile.dat PASV 227 Entering Passive Mode (x,x,x,x,y,y) 150 Opening data connection for somefile.dat Client Server Filter Update Start Filter Update End Command t Data The Problems (in BPF) FTP Passive Data Transfer • Long filter install latency (milliseconds ~ seconds) • Compilation • User-kernel copying • Security checking • Dynamic filtering: Certain filtering criteria may not be predetermined • Filter updates have to be performed online Port = ? O(n) regardless of the number of changes, say x O(n) O(n) 6 Party Time!
56%Overhead The Problems (in BPF) … Load instruction [IP] Advance IP Branch according to instruction type * Load high word from packet[20] Check if IP is at the end of program … • Filter execution efficiency • High interpretation overhead:pseudo-machine vs. filter program execution (000) ldh [12] (001) jeq #0x800 jt 2 jf 14 (002) ldb [23] (003) jeq #0x6 jt 4 jf 14 (004) ldh [20] (005) jset #0x1fff jt 14 jf 6 (006) ldxb 4*([14]&0xf) (007) ldh [x + 14] (008) jeq #0x50 jt 13 jf 9 (009) jeq #0x19 jt 13 jf 10 (010) ldh [x + 16] (011) jeq #0x50 jt 13 jf 12 (012) jeq #0x19 jt 13 jf 14 (013) ret #96 (014) ret #0 7
Previous Solutions Main focus: Improving filter execution efficiency • xPF: Storage and loops • MPF: Associative matching • DPF: Dynamic filter program generation • Compiles filter program into native binary code on-the-fly, instead of using pseudo-machine interpreter. 8
Previous Solutions • FFPF: Packet Filtering Framework • External functions • Pieces of filter programs pre-compiled into native binary code • Loaded into kernel as needed • Addresses both update latency and execution efficiency, however introduces new complications • Kernel module programming • Lack library support, limited resource, hard to debug • Potential security risks 9
Swift: Our Solution • The Swift Packet Filter • Inherits “pro factors” of BPF: pseudo-machine, control flow, and filter language primitives • Renovates the filter engine • Optimize towards fast filter update • Increase instruction set efficiency • Achieves significant speedup against BPF • Up to three orders of magnitude speedup in filter update • Up to three times as fast as BPF in filter execution • Completely compatible implementation with BPF in the Linux 2.6 kernel 10
Addressing Modes Absolute Offset MAC header IP header TCP header Payload Swift Design:Specialized Instruction Set BPF Primitives (selection) Direct Addressing Layer 2Indirect Addressing Layer 1Indirect Addressing • Rational: avoid filter compilation (> 99% delay) • Compilation – translation from high level descriptive language to low level machine language • Specializing instruction set:high level language low level language • O(n) 0 • How it is done: • Formulate directly on BPF language primitives • Classify by method of addressing data • Abstract data manipulation operations • Extending instruction functionalities ether proto ip proto ip host ip net tcp port ip payload Swift Instructions (selection) D_SEQ L1_LEQ Variable Hdr Len Variable Hdr Len D_MEQ L1_SEQ Relative Offset D_EQ 11 Skip
1024 bytes 32 bytes 128 kilobytes Command T[0] T[1] T[2] T[3] T[4] T[5] T[6] States I[0] I[1] I[2] … I[29] Padding Info P[0] P[1] P[2] … P[126] Padding Swift Design:Strict Program Organization Instruction • Rational: incremental program update • Swift and accurately “pin point” the part to be updated • Only modify what is necessaryreduce unnecessary user-kernel data transfer • O(n) O(x) [x: number of changes] • How it is done: • Fix length instruction set • Index each instruction • Index each control flow in filter program Pass(Control Flow) PassSet(Filter Program) To locate an instruction: P[X]->I[Y] To locate a single parameter: P[X]->I[Y]->T[Z] 12 Skip
States States I[0] I[0] I[1] I[1] I[2] I[2] … … I[29] I[29] Padding Padding Swift Design:Acyclic Definitive Finite Automata • Rational: eliminate security checking • Is all that computational power necessary? • Packet filtering is basically pattern matching with some standard data manipulation operations • Principal of Least Privilege (POLP) • Why not have a secure filtering engine and forget about security checking? O(n) 0 • How it is done: • Acyclic Definitive Finite Automata • Remove execution path control • Remove data storage P[n] P[n+1] 13 Skip
Command T[0] T[1] T[2] T[3] T[4] T[5] T[6] 28%Overhead ether_proto 32 bits ip_ver ip_hl ip_tos ip_total_length ip_ident ip_fl ip_fragment ip_ttl ip_proto Swift Design:Optimizations “ether proto ip and tcp port 21” – The Swift Way I[0]: D_LEQ_M @ 12, 0xFFFF0000, 0x08000000, 0x00000000, 0x00000000, 0x1FFF00FF, 0x00000006 I[1]: … • D_EQ – Direct Addressing, Compare 4 bytes • e.g. Compare packet source IP address against a given value • SIMD (Single Instruction, Multiple Data) • Pack multiple operands into one instruction • Reduce instruction interpretation overhead • Alternative instruction usage • Further enhance execution efficiency • Example: “ip and tcp port 21” • 6 BPF instructions vs. 1 Swift instruction • D_EQ_SIMD = Up to 7x D_EQ • e.g. Compare packet source IP address against up to 7 given addresses “ether proto ip and tcp port 21” – The BPF Way (000) ldh [12] (001) jeq #0x800 jt 2 jf 12 (002) ldb [23] (003) jeq #0x6 jt 4 jf 12 (004) ldh [20] (005) jset #0x1fff jt 12 jf 6 (006) … IP TCP Non-frag 14 Skip
Pass 3 Pass 1 Pass 2 I[0] I[0] I[0] I[1] I[1] I[1] I[2] I[2] I[2] … … … I[29] I[29] I[29] … … … Pass 1 (Parent) Pass 1 Pass 2 IP? Host src 1.2.3.4? TCP? Non-frag? Src Port 21? TCP/IP Non-Frag? TCP/IP Non-Frag? TCP/IP Non-Frag? Pass 2 (Child) Dst Port 21? Host dst 1.2.3.4? Host Src/Dst 1.2.3.4? Host Src/Dst 1.2.3.4? Host Src/Dst 1.2.3.4? TCP/IP Non-Frag? Src Port 22? Host Src/Dst 1.2.3.4? Src/Dst Port 21? Src/Dst Port 21? Src/Dst Port 22? Indirect Address Calculation Computation Cache For Layer Header Length L1HDR = ?? L1HDR = XX L2HDR = YY L2HDR = ?? Dst Port 22? Src/Dst Port 21? MAC header IP header TCP header Payload IP? Host src 1.2.3.4? TCP? Non-frag? Src Port 21? Dst Port 21? Host dst 1.2.3.4? IP? Host src 1.2.3.4? TCP? Non-frag? Src Port 22? Dst Port 22? Host dst 1.2.3.4? Swift Design:Tradeoffs “(ip host 1.2.3.4 and tcp 21)or (ip host 1.2.3.4 and tcp 22)” • Control flow level redundancy • Due to removal of compilation & optimization • Example:“(ip host 1.2.3.4 and tcp 21) or (ip host 1.2.3.4 and tcp 22)” • Solution: Pass duplication • Explicit incremental optimization (by user program) • Instruction level redundancy • Due to removal of data storage • Example: Indirect Addressing Operations • Solution: Computation cache • Implicit data passage (hint) BPF Filter Schematics Swift Filter Schematics L1HDR = 16 L2HDR = 24 Src/Dst Port 22? 15 Skip
Swift Implementation • Integrated in recent a Linux 2.6 kernel • Added 3 files & modified 9 files • Support both i386 and x86_64 platform • Completely compatible with BPF • Swift coexists with Linux Socket Filter (LSF), a module equivalent of BPF on Linux systems • Both controlled using the setsockopt() system call • Both work with libpcap • Separate user level control library • User friendly interface • Supports object oriented programming 16
Swift Evaluation:Setup • Testbed • Managed Gigabit Ethernet Switch • Dedicated machine for high speed trace playback • Three test machines with different hardware setup • Processor cores: Pentium 4 ~ Xeon EM64T DualCore • Processor cache (L2): 512KB ~ 4MB • Front Side Bus: 533MHz ~ 1333MHz 17
Filter Update Latencies on PC3 Number of Missed Packets on PC3 Swift Evaluation:Dynamic Filtering • Description • Application: FTP passive data transfer capturing • Workload: traffic trace of 1-200 concurrent FTP data sessions, with and without high speed background traffic • Metric: filter update latency and number of missed packets • Results • Swift’s update latency is three orders of magnitude lower than LSF. • Swift reduces the number of missing packets per connection by about two orders of magnitude in comparison with LSF. 18
All Filter / PC Performance Chart Swift Evaluation:Static Filtering • “” • “ip” • “ip src or dst net aa.bb.cc.0/24” • “ip src net aa.bb.cc.0/24 and dst net dd.0.0.0/8” • “ip and tcp port (…6 port numbers…)” • “ip and not tcp port (…3 port numbers…) and not host (…38 hosts…)” • Description • Six different filtering criteria with increasing complexity and workload • Compare Swift against LSF and optimized C (Opt-C) • Metric: x86 Time-Stamp Counter (TSC) cycles • Results • For simple criteria: Swift performs slightly slower but comparable to LSF • For complex criteria Swift outperforms LSF: Swift demonstrates up to three times the performance of LSF 19
Conclusion • We propose Swift as a new solution to speed up network packet filtering • Key features: • Highly efficient instruction set • Simple computational model • Compatibility with BPF and its user library • Significant improvement over BPF • Update latency (three orders of magnitude faster) • Execution efficiency (up to three times the performance) 20