Pål Halvorsen, Thomas Plagemann, and Vera Goebel Institute for Informatics, University of Oslo

How to Minimize Transport Protocol Processing:Implementation and Evaluation of Network Level Framing Pål Halvorsen, Thomas Plagemann, and Vera Goebel Institute for Informatics, University of Oslo Norway 4th International Workshop on Multimedia Network Systems and Applications (MNSA ’02),Vienna, Austria, July 2002

Overview • Application scenario • The INSTANCE project • Network Level Framing (NLF) • design and implementation • performance evaluation • Summary and conclusions

Application Scenario Media-on-Demand server: Applicable in applications like News- or Video-on-Demand provided by city-wide cable or pay-per-view companies Multimedia Storage Server Network Network • Retrieval is the bottleneck:Some important factors: • Memory management • Communication protocol processing • Error management • Project goals:Optimize performance within a single server: • Reduce resource requirements • Maximize number of clients

Project goals:Optimize performance within a single server: • Reduce resource requirements • Maximize number of clients The INSTANCE Project • We try to make optimal use of agiven set of resources: • memory architecture • integrated error management • network level framing (NLF)

Traditional Approach Upload to server Frequency: low (1) Download from server Frequency: very high

TRANSPORT TRANSPORT Network Level Framing (NLF): Basic Idea TRANSPORT TRANSPORT Upload to server Frequency: low (1) Download from server Frequency: very high

When to Store Packets

udp_PreOut() udp_output() Prepend UDP and IP headers Temporarilyconnect udp_output() Prepare pseudo header for checksum, clear unknown fields Prepend UDP and IP headers Precalculate checksum Prepare pseudo header for checksum Calculate checksum udp_QuickOut() Update UDP and IP headers Fill in some other IP header fields Update checksum, i.e., only add checksum of prior unknown fields Hand over datagram to IP Fill in other IP header fields Hand over datagram to IP Disconnectconnectsocket Splitting the UDP Protocol UDP UDP UDP

Traditional Checksum Operations – I • The UDP checksum covers three fields: • A 12 byte pseudo header containg fields from the IP header • The 8 byte UDP header • The UDP data (payload) • Simplified checksum calculation function (in_cksum): u_16int_t *w; int checksum; for each mbuf in packet { w = mbuf -> m_data; while data in mbuf { checksum += w; w++; } }

Traditional Checksum Operations – II • Traditional checksum operation: u_16int_t *w; int checksum; for each mbuf in packet { w = mbuf -> m_data; while data in mbuf { checksum += w; w++; } }

+ + = Modified Checksum Operations • NLF checksum operation:

data precalculated header (meta-data) Implementation – I • Straight forward implementation: • To allow flexibility, we have one data and one meta-data file: data meta-data UDP

Implementation – II • NLF version 1: • most of the UDP/IP processing is spent on checksum calculation • precalculate checksum over data payload • during transmission time: • generate header • calculate checksum over header and add precalculated payload checksum • NLF version 2: • several reports show increased performance using header templates • precalculate checksum over data payload • during stream open: • generate header template • calculate header checksum • during transmission time: • block copy header template • add header template checksum, payload checksum, and packet length field

Performance: Test Setup • Implemented in NetBSD 1.5.2 • Dell Precision Workstation 620 • PIII 933 MHz CPU • 3 COM 1 Gbps NIC • Software probe • RDTSC instruction • CPUID instruction • probe overhead 206 cycles • Performed tests using 1 KB, 2 KB, 4 KB, and 8 KB UDP packets • Transmitting 225 MB of data • Data is transmitted using the zero-copy data path

~ 50 cycles less Performance: Checksum Overhead increases linearly with payload size 11899 23674 CPU cycles Overhead is constant regardless of payload Packet size

~25 cycles more Performance: Header Overhead CPU cycles Packet size NLF version 3: use header template checksum, but generate header instead of block copy

Performance: UDP 12304 24108 CPU cycles Packet size

Conclusions and Future Work • Network Level Framing reduces communication system processing by precalculating • payload checksum (off-line) • header checksum (stream open) • Gain per packet is dependent of packet payload size, e.g., 1 KB (8 KB)  97.3 % (99.6 %) Our mechanisms (at least) double the number of concurrent clients • Ongoing and future work: • NLF in lower protocols (ongoing) • On-board processing

Questions??

Related Work • Checksum caching in memory • high data rates  cached elements will be removed before it can be reused • Header templates • block-copying is time consuming • On-Board processing • useful and becoming “off-the-shelve” hardware • may be nice to combine with NLF

Pål Halvorsen, Thomas Plagemann, and Vera Goebel Institute for Informatics, University of Oslo

Pål Halvorsen, Thomas Plagemann, and Vera Goebel Institute for Informatics, University of Oslo

Presentation Transcript

The Future of Biomedical Informatics

Medical Informatics:

Bridging Bioinformatics and Chem(o)informatics

Duncan Thomas University of Southern California Los Angeles, USA

VERİ ZARFLAMA ANALİZİ

Sternberg Astronomical Institute Moscow University

Gerhard Weikum Max Planck Institute for Informatics mpi-inf.mpg.de/~weikum/

INTERNATIONAL COLLABORATION IN PROTEOMICS AND INFORMATICS

Lars Kirkhusmo Pharo History of Religions University of Oslo

Catholic University, PUCRS (Brazil) Faculty of Informatics and Faculty of Engineering

Grid-based System for Flood Forecasting

Social Informatics

Biodiversity Informatics S. Sreekumar, Safeer PM, Biju CK, Raveendran M,

Informed by Informatics?

The Verilog Hardware Description Language

Thomas Hobbes vs. John Locke