210 likes | 344 Views
Insights Into RouterVM’s Flexibility and Performance. Mel Tsai mtsai@eecs.berkeley.edu. Outline. Network Appliance Convergence Brief Overview of RouterVM & GPFs GPF Flexibility GPF Performance Demo. Server Load Balancer. Content Cache. Switch. Switch. Switch. Switch. IP Storage
E N D
Insights Into RouterVM’s Flexibility and Performance Mel Tsai mtsai@eecs.berkeley.edu
Outline • Network Appliance Convergence • Brief Overview of RouterVM & GPFs • GPF Flexibility • GPF Performance • Demo
Server Load Balancer Content Cache Switch Switch Switch Switch IP Storage Gateway Switch Intrusion Detection Firewall / VPN LinkCompressor New Requirements in the Enterprise 200 Mbps 1 Gbps 1 Gbps SAN 1 Gbps 1 Gbps 1 Gbps 2.5 Gbps 1 Gbps 1 Gbps ClientWorkstations Server Blades 2.5 - 10 Gbps Offsite 1-2.5 Gbps ISP Edge Router 40 Mbps
Network Appliance Convergence • Recent strong trend towards cascading multiple functions into one appliance • Netscalar, F5, Redline, Tasman, Inkra • The hardware is coming… We are slowing reaching the point where we can do almost anything to packet flows at line rate • But how do you manage multiple devices/functions in your network? • What about configurability and ease-of-deployment? • Can end-users or administrators program the device? • What about the user interface?
FILTER 19 SETUP NAME - SIP - SMASK - DIP - DMASK - PROTO - SRC PORT - DST PORT - VLAN - ACTION - example any 255.255.255.255 10.0.0.0 255.255.255.0 tcp,udp any 80 default drop ClassificationParameters Action RouterVM Overview • RouterVM turns the concept of a “packet filter” into a high-level, programmable building-block for network appliance applications RouterVM Generalized Packet Filter (type L7) Traditional Filter
Greater flexibility, more difficult to use Less flexibility, easier to use …and generally higher performance? …and generally lower performance? Trade-offs for GPF Flexibility # of classification fields more fewer classification depth deeper shallower # of actions more fewer # of programmatic elements more fewer # of packet tagging options more fewer # of control flow options more fewer Extent and variety of per-flow state more fewer (cont )
Greater flexibility, more difficult to use Less flexibility, easier to use …and higher performance? …and lower performance? Trade-offs for GPF Flexibility # of classification fields more fewer classification depth deeper shallower # of actions more fewer # of programmatic elements more fewer # of packet tagging options more fewer # of control flow options more fewer Extent and variety of per-flow state more fewer (cont ) Where is the sweet spot? Depends on the application and usage scenario!
Greater flexibility, (somewhat) more difficult to use Less flexibility, easier to use …and higher performance? …and lower performance? Trade-offs for GPF Flexibility # of classification fields more fewer classification depth deeper shallower # of actions more fewer # of programmatic elements more fewer # of packet tagging options more fewer # of control flow options more fewer Extent and variety of per-flow state more fewer In addition, a complexity-hiding intelligent interface and the use of smart defaults can shift the sweet spot towards greater flexibility, without decreasing ease of use.
How many GPF types are enough? • Not a simple question, since the number of applications and usage scenarios supported by a library of GPFs is not equal to the number of available GPFs • By virtue of a common set of available actions, any GPF can support the following features: • Programmatic decision making (“if dest_ip == 127.0.0.0 then drop;”) • Server load balancing (“loadbalance table SLB_Table;”) • Packet field rewriting (“rewrite dest_ip 192.168.0.1;”) • Packet duplication (“copy;”) • QoS (“ratelimit 1 Mbps;”) • Packet logging (“log intrusion_log.txt;”) • Network address translation (“nat dir=forward, table=NAT_table;”) • Server health monitoring (“if 192.168.0.5 is alive”); • …and others • In practice, actions serve to multiply the base-level functionality of a given GPF to a much higher level than suggested by its name • “A server load-balancing, bandwidth throttling, health monitoring, and statistics-gathering ‘L7 filter’”
Planned/Implemented GPF Libraryfor RouterVM .NET • Basic Filter • Simple L2-L4 header classifications • Any RouterVM actions • L7 Filter • Adds regular expressions & ADU reconstruction • NAT Filter • Adds a few more capabilities beyond the simple NAT action that is available to all GPFs • Content Caching • Builds on the L7 filter functionality • WAN Link Compression • Relatively simple to specify, but requires lots of computation • IP-to-FC Gateway • Requires its own table format & processing • XML Preprocessing • Not very well documented, and difficulty is unknown…
GPF Flexibility by OSI Layer …As expected, GPF flexibility at the application layers starts to depend heavily on thebreadth of the GPF library and the availability of GPFs for specific applications
GPF Performance: Basic Filters • Performance of filters has been measured on RouterVM for .NET using Win32 performance counters • Accurate to roughly 0.5 microseconds • Measured on an Athlon XP 2000 system, Win2k • A basic filter with simple actions (no payload processing) requires roughly 3000 CPU cycles to perform its processing • This is mostly independent of packet size • Results in ~284 Mbps for 64-byte packets, 6.7 Gbps for 1500-byte packets (theoretically of course) • If the average packet size is ~240 bytes, a packet stream can traverse 10 basic filters and still maintain 100 Mbps • …Keep in mind, this is with no optimization (yet)!
GPF Performance: Complex Filters • What about complex L7 filters that search packet payloads with regular expressions? • Benchmark setup… Let’s hand-craft a packet stream of 256-byte packets: 25 bytes of char ‘X’ Padding with ‘X’ L2-L4 Headers “Retreat” “Retreat” 25 bytes of char ‘X’ “Retreat” • Create three different L7 filters, which search for three different patterns: • ^Retreat • ^Retreat.*Retreat • ^Retreat.*Retreat.*Retreat • Although this is instructive, the setup is a little artificial • We’re searching every bit of every packet payload, whereas a real L7 filter would stop when it identifies a flow matching the expression
25 bytes of char ‘X’ Padding with ‘X’ L2-L4 Headers “Retreat” “Retreat” 25 bytes of char ‘X’ “Retreat” GPF Performance: Complex Filters
25 bytes of char ‘X’ Padding with ‘X’ L2-L4 Headers “Retreat” “Retreat” 25 bytes of char ‘X’ “Retreat” GPF Performance: Complex Filters Lesson: try to use start-of-buffer indicators ^ and avoid *’s… Many apps can be identified with simple start-of-buffer expressions .NET Regex also involves payload copying, which might be avoidable
Thread Optimization • The choice of thread boundaries, thread scheduling, and packet FIFO implementations has a tremendous impact on overall performance • My current choice of four threads per module/port is too many… • Too difficult to optimally schedule the CPU, and overall performance is at least 10X slower than should be possible • Also, threads waste a lot of time waiting for locks on the packet FIFOs, which also can be avoided by reducing the # of threads
Performance Conclusions • RouterVM for .NET is just one possible implementation of RouterVM, and is only a demonstration of functionality, not performance • Many other performance aspects haven’t been mentioned, such as maintaining shared tables and per-flow state. • …Left for future presentations • Porting RouterVM to higher-performance parallel hardware should drastically increase performance • RouterVM’s 3000/cycles per packet per basic filter using .NET would be a terrible result for a network processor! • Dedicated search hardware is severely needed… • It is trivial to come up with regular expression searches that require 200,000+ cycles per packet using .NET’s regular expression engine • Other regular expression libraries may be faster, but a software-only approach will rarely be good enough for high-performance datacenter apps
Comments on GPF Flexibility • We can show that GPFs are flexible by examining the following GPF properties: • Classification capabilities • Headers fields only vs. headers + payloads • Stateless classifications vs. stateful, individual packets vs. specific flows • Simple field searches vs. complex general search expressions • Layer support: L1 through L7 • Action capabilities • Packet handling (allow, drop, packet generation/copying) • Packet rewriting (header field rewrites, truncation, header stripping/adding, checksum recalculations) • Control flow (filter jump/skip via tags, messaging to downstream filters & RouterVM elements such as the routing engine) • QoS support (e.g. rate limiting, WFQ, etc.) (cont )
Comments on GPF Flexibility (cont) • Maintaining shared state and GPF interaction • Efficient state sharing mechanism through tables or message passing • Maintaining per-flow state within a filter, and between filters • Mass storage capability (e.g. for content caching) • Computational Power • Simple, low-latency computations vs. complex, high-latency computations (e.g. NIDS, in-network antivirus scanning) • Specification Flexibility • Specific Application Support • Storage, XML, Wireless, etc.