360 likes | 384 Views
uspace. u. k. n. kspace. nspace. OS support for (flexible) high-speed networking. Herbert Bos. Vrije Universiteit Amsterdam. CPU. MEM. NIC. we’re hurting because of. hardware problems software problems language problems lack of flexibility network speed may grow too fast anyway
E N D
uspace u k n kspace nspace OS support for (flexible) high-speed networking Herbert Bos Vrije Universiteit Amsterdam
CPU MEM NIC we’re hurting because of • hardware problems • software problems • language problems • lack of flexibility • network speed may grow too fast anyway • multiple apps problem
What are we building? different barriers flows
uspace kernel What are we building? if (pkt.proto == TCP) then if (http) then scan webattacks; else if (smtp) scan spam;scan viruses fi else if (pkt.proto == UDP) then if (NFS traffic) mem = statistics (pkt); else scan SLAMMER; fi • reconfigure when load changes • dynamic optical infrastructure: • - configure at startup time- construct datapaths
- no ‘vertical’ copies - no ‘horizontal’ copies within flow group - more than ‘just filtering’ in kernel (e.g.,statistics) Application B ‘filter’ reduce copying • FFPF avoids both ‘horizontal’ and ‘vertical’ copies Application A U K
minimise copying, context switching flowgraph = DAG different copying regimes endianness new languages combine multiple requests control dataplane different buffer regimes
Software structure applications FFPF toolkit MAPI FFPF userspace libpcap userspace-kernel boundary FFPF kernelspace host- NIC boundary FFPF ixp2xxx
current status • working for Linux PC with • NICs • IXP1200s and IXP2400s (plugged in) • remote IXP2850 (with poetic license) • rudimentary support for distributed packet processor • speed comparable to tailored solutions
Thank you! http://ffpf.sourceforge.net/
Three flavours • Three flavours of packet processing on IXP & host • Regular • copy only when needed • may be slow depending on access pattern • Zero copy • never copy • may be slow depending on access pattern • Copy once • copy always 11
combining filters (dev,expr=/dev/ixp0) >[ [ (BPF,expr=“…”) >(PKTCOUNT) ] | (FPL2,expr=“…”) ]> (PKTCOUNT)
Example 1: writing applications • int main(int argc, char** argv) { • void *opaque_handle; • intcount, returnval; • /** pass the string to the subsystem. It initializes and returns an opaque pointer */ • opaque_handle = ffpf_open(argv[1], strlen(argv[1])); • if(start_listener(asynchronous_listener, NULL)) • { • printf (WARNING,“spawn failed\n"); • ffpf_close (opaque_handle); return -1; • } • count = ffpf_process(FFPF_PROCESS_FOREVER); • stop_listener(1); • returnval = ffpf_close(opaque_handle); • printf(,"processed %d packets\n",count); • return returnval; • }
Example 1: writing applications • void* asynchronous_listener(void* arg) { • int i; • while (!should_stop()) { • sleep(5); • for (i=0;i<export_len;i++) { • printf("flowgrabber %d: read %d, written%d\n", i, get_stats_read(exported[i]), get_stats_processed(exported[i])); • while ((pkt =get_packet_next(exported[i]))) { • dump_pkt(pkt); • } • } • } • return NULL; • }
Example 2: FPL-2 • new pkt processing language: FPL-2 • for IXP, kernel and userspace • simple, efficient and flexible • simple example: filter all webtraffic • IF ( (PKT.IP_PROTO == PROTO_TCP) && (PKT.TCP_PORT == 80)) THEN RETURN 1; • more complex example: count pkts in all TCP flows • IF (PKT.IP_PROTO == PROTO_TCP) THEN R[0] = Hash[ 14, 12, 1024]; M[ R[0] ]++;FI
FPL-2 • all common arithmetic and bitwise operations • all common logical ops • all common integer types • for packet • for buffer ( useful for keeping state!) • statements • Hash • External functions • to call hardware implementations • to call fast C implementations • If … then … else • For … break; … Rof • Return
Example application: dynamic ports 1. // R[0] stores no. of dynports found (initially 0) 2. IF (PKT.IP_PROTO==PROTO_TCP) THEN 3. IF (PKT.TCP_DPORT==554) THEN 4. M[R[0]]=EXTERN("GetDynTCPDPortFromRTSP",0,0); 5. R[0]++; 6. ELSE // compare pkt’s dst port to all ports in array – if match, return pkt 7. FOR (R[1]=0; R[1] < R[0]; R[1]++) 8. IF (PKT.TCP_DPORT == M[ R[1] ] ) THEN 9. RETURN TRUE; 10. FI 11. ROF 11. FI 12. FI 12. RETURN FALSE;
efficient userspace ? • reduced copying and context switches • sharing data • flowgraphs: sharing computations kernel ? network card ? ? x “push filtering tasks as far down the processing hierarchy as possible”
spread of SAPPHIRE in 30 minutes -process at lowest possible level-minimise copying -minimise context switching -freedom at the bottom Network Monitoring • Increasingly important • traffic characterisation, securitytraffic engineering, SLAs, billing, etc. • Existing solutions: • designed for slow networksor traffic engineering/QoS • not very flexible • We’re hurting because of • hardware (bus, memory) • software (copies, context switches) demand for solution:-scales to high link rates - scales in no. of apps - flexible
UDP with CodeRed eth0 TCP SYN bytecount HTTP U RTSP TCP IP “contains worm” UDP RTP UID 0 generalised notion of flow Flow: “a stream of packets that match arbitraryuser criteria” Flowgraph
Application B ‘filter’ reduce copying • FFPF avoids both ‘horizontal’ and ‘vertical’ copies Application A U K
Extensible • modular framework • language agnostic • plug-in filters (device,eth0) -> (sampler,2) -> (BPF,”..”) -> (packetcount) (device,eth0) | (device,eth1) -> (sampler,2) -> (FPL-2,”..”) | (BPF,”..”) -> (bytecount)
R O O O O O O O W Buffers • PacketBuf • circular buffer with N fixed-size slots • large enough to hold packet • IndexBuf • circular buffer with N slots • contains classification result + pointer
Buffers O O O • PacketBuf • circular buffer with N fixed-size slots • large enough to hold packet • IndexBuf • circular buffer with N slots • contains classification result + pointer O O O O W R
Buffers X X X • PacketBuf • circular buffer with N fixed-size slots • large enough to hold packet • IndexBuf • circular buffer with N slots • contains classification result + pointer X X O O W R
R1 Buffer management O O O O O O O O O O what to do if writer catches up with slowest reader? • slow reader preference • drop new packets (traditional way of dealing with this) • overall speed determined by slowest reader • fast reader preference • overwrite existing packets • application responsible for keeping up • can check that packets have been overwritten • different drop rates for different apps O O O O O O R1 W
IF (PKT.IP_PROTO == PROTO_TCP) THEN // reg.0 = hash over flow fields R[0] = Hash (14,12,1024) // increment pkt counter at this // location in MBuf MEM[ R[0] ]++FI • simple to use • compiles to optimised native code • resource limited (e.g., restricted FOR loop) • access to persistent storage (scratch memory) • calls to external functions (e.g., fast C functions or hardware assists) • compiler for uspace, kspace, and nspace (ixp1200) Languages • FFPF is language neutral • Currently we support: • BPF • C • OKE Cyclone • FPL
packet sources • currently three kinds implemented -netfilter -net_if_rx() -IXP1200 • implementation on IXPs: NIC-FIX -bottom of the processing hierarchy -eliminates mem & bus bottlenecks uspace kspace nspace
zero copy copy once on-demand copy Network Processors “programmable NIC”
Performance results pkt loss: FFPF: < 0.5% LSF: 2-3%
Performance results pkt loss: LSF:64-75% FFPF: 10-15%
Summary concept of ‘flow’ generalised copying and context switching minimised processing in kernel/NIC complex programs + ‘pipes’ FPL: FFPF Packet Languages fast + flexible persistent storage flow-specific state authorisation + third-party code any user flow groupsapplications sharing packet buffers
Thank you! http://ffpf.sourceforge.net/
CPU MEM NIC we’re hurting because of • kernel: too little, too often • vertical + horizontal copies • context switching • popular ones: not expressive • expressive ones: not popular • same for speed • no way to mix and match • hardware problems • software problems • language problems • lack of flexibility • network speed may grow too fast anyway • multiple apps problem • heterogeneous hardware • programmable cards, routers, etc. • no easy way to combine • build on abstractions that are too hi-level • hard to use as ‘generic’ packet processor • 10 40Gbps: what will you do in 10 ns? • beyond capacity of single processor