200 likes | 287 Views
How to Minimize Transport Protocol Processing: Implementation and Evaluation of Network Level Framing. Pål Halvorsen, Thomas Plagemann, and Vera Goebel Institute for Informatics, University of Oslo Norway.
E N D
How to Minimize Transport Protocol Processing:Implementation and Evaluation of Network Level Framing Pål Halvorsen, Thomas Plagemann, and Vera Goebel Institute for Informatics, University of Oslo Norway 4th International Workshop on Multimedia Network Systems and Applications (MNSA ’02),Vienna, Austria, July 2002
Overview • Application scenario • The INSTANCE project • Network Level Framing (NLF) • design and implementation • performance evaluation • Summary and conclusions
Application Scenario Media-on-Demand server: Applicable in applications like News- or Video-on-Demand provided by city-wide cable or pay-per-view companies Multimedia Storage Server Network Network • Retrieval is the bottleneck:Some important factors: • Memory management • Communication protocol processing • Error management • Project goals:Optimize performance within a single server: • Reduce resource requirements • Maximize number of clients
Project goals:Optimize performance within a single server: • Reduce resource requirements • Maximize number of clients The INSTANCE Project • We try to make optimal use of agiven set of resources: • memory architecture • integrated error management • network level framing (NLF)
Traditional Approach Upload to server Frequency: low (1) Download from server Frequency: very high
TRANSPORT TRANSPORT Network Level Framing (NLF): Basic Idea TRANSPORT TRANSPORT Upload to server Frequency: low (1) Download from server Frequency: very high
udp_PreOut() udp_output() Prepend UDP and IP headers Temporarilyconnect udp_output() Prepare pseudo header for checksum, clear unknown fields Prepend UDP and IP headers Precalculate checksum Prepare pseudo header for checksum Calculate checksum udp_QuickOut() Update UDP and IP headers Fill in some other IP header fields Update checksum, i.e., only add checksum of prior unknown fields Hand over datagram to IP Fill in other IP header fields Hand over datagram to IP Disconnectconnectsocket Splitting the UDP Protocol UDP UDP UDP
Traditional Checksum Operations – I • The UDP checksum covers three fields: • A 12 byte pseudo header containg fields from the IP header • The 8 byte UDP header • The UDP data (payload) • Simplified checksum calculation function (in_cksum): u_16int_t *w; int checksum; for each mbuf in packet { w = mbuf -> m_data; while data in mbuf { checksum += w; w++; } }
Traditional Checksum Operations – II • Traditional checksum operation: u_16int_t *w; int checksum; for each mbuf in packet { w = mbuf -> m_data; while data in mbuf { checksum += w; w++; } }
+ + = Modified Checksum Operations • NLF checksum operation:
data precalculated header (meta-data) Implementation – I • Straight forward implementation: • To allow flexibility, we have one data and one meta-data file: data meta-data UDP
Implementation – II • NLF version 1: • most of the UDP/IP processing is spent on checksum calculation • precalculate checksum over data payload • during transmission time: • generate header • calculate checksum over header and add precalculated payload checksum • NLF version 2: • several reports show increased performance using header templates • precalculate checksum over data payload • during stream open: • generate header template • calculate header checksum • during transmission time: • block copy header template • add header template checksum, payload checksum, and packet length field
Performance: Test Setup • Implemented in NetBSD 1.5.2 • Dell Precision Workstation 620 • PIII 933 MHz CPU • 3 COM 1 Gbps NIC • Software probe • RDTSC instruction • CPUID instruction • probe overhead 206 cycles • Performed tests using 1 KB, 2 KB, 4 KB, and 8 KB UDP packets • Transmitting 225 MB of data • Data is transmitted using the zero-copy data path
~ 50 cycles less Performance: Checksum Overhead increases linearly with payload size 11899 23674 CPU cycles Overhead is constant regardless of payload Packet size
~25 cycles more Performance: Header Overhead CPU cycles Packet size NLF version 3: use header template checksum, but generate header instead of block copy
Performance: UDP 12304 24108 CPU cycles Packet size
Conclusions and Future Work • Network Level Framing reduces communication system processing by precalculating • payload checksum (off-line) • header checksum (stream open) • Gain per packet is dependent of packet payload size, e.g., 1 KB (8 KB) 97.3 % (99.6 %) Our mechanisms (at least) double the number of concurrent clients • Ongoing and future work: • NLF in lower protocols (ongoing) • On-board processing
Related Work • Checksum caching in memory • high data rates cached elements will be removed before it can be reused • Header templates • block-copying is time consuming • On-Board processing • useful and becoming “off-the-shelve” hardware • may be nice to combine with NLF