230 likes | 420 Views
Checksum ‘offloading’. A look at how the Pro1000 NICs can be programmed to compute and insert TCP/IP checksums. Network efficiency. Last time (in our ‘nictcp.c’ demo) we saw the amount of work a CPU would need to do when setting up an ethernet packet for transmission with TCP/IP protocol format
E N D
Checksum ‘offloading’ A look at how the Pro1000 NICs can be programmed to compute and insert TCP/IP checksums
Network efficiency • Last time (in our ‘nictcp.c’ demo) we saw the amount of work a CPU would need to do when setting up an ethernet packet for transmission with TCP/IP protocol format • In a busy network this amount of packet- computation becomes a ‘bottleneck’ that degrades overall system performance • But a lot of that work can be ‘offloaded’!
The ‘loops’ are costly • To prepare for a packet-transmission, the device-driver has to execute a few dozen assignment-statements, to set up fields in the packet’s ‘headers’ and in the Transmit Descriptor that will be used by the NIC • Most of these assignments involve simple memory-to-memory copying of parameters • But the ‘checksum’ fields require ‘loops’
Can’t ‘unroll’ checksum-loops • One programming technique for speeding up loop-execution is known as ‘unrolling’, to avoid the ‘test-and-branch’ inefficiency: • But it requires knowing in advance what number of loop-iterations will be needed int sum = 0; sum += wp[0]; sum += wp[1]; sum += wp[2]; … sum += wp[99];
The ‘offload’ solution • Modern network controllers can be built to perform TCP/IP checksum calculations on packet-data as it is being fetched from ram • This relieves a CPU from having to do the most intense portion of packet preparation • But ‘checksum offloading’ is an optional capability that has to be ‘enabled’ – and ‘programmed’ for a specific packet-layout
‘Context’ descriptors • Intel’s Pro1000 network controllers employ special ‘Context’ Transmit-Descriptors for enabling and configuring the ‘checksum-offloading’ capability • Two kinds of Context Descriptor are used: • An ‘Offload’ Context Descriptor (Type 0) • A ‘Data’ Context Descriptor (Type 1)
Context descriptor (type 0) 63 48 47 40 39 32 31 16 15 8 7 0 TUCSE TUCSO TUCSS IPCSE IPCSO IPCSS MSS HDRLEN RSV STA TUCMD DTYP =0 PAYLEN DEXT=1 (Extended Descriptor) Legend: IPCSS (IP CheckSum Start) TUCSS (TCP/UDP CheckSum Start) IPCSO (IP CheckSum Offset) TUCSO (TCP/UDP CheckSum Offset) IPCSE (IP CheckSum Ending) TUCSE (TCP/UDP CheckSum Ending) PAYLEN (Payload Length) DTYP (Descriptor Type) TUCMD (TCP/UCP Command) STA (TCP/UDP Status) HDRLEN (Header Length) MSS (Maximum Segment Size)
The TUCMD byte 7 6 5 4 3 2 1 0 IDE SNAP DEXT (=1) reserved (=0) RS TSE IP TCP Legend: IDE (Interrupt Delay Enable) SNAP (Sub-Network Access Protocol) DEXT (Descriptor Extension) RS (Report Status) TSE (TCP-Segmentation Enable) IP (Internet Protocol) TCP (Transport Control Protocol) always valid valid only when TSE=1
Context descriptor (type 1) 63 48 47 40 39 32 31 16 15 8 7 0 ADDRESS VLAN POPTS RSV STA DCMD DTYP =1 DTALEN DEXT=1 (Extended Descriptor) Legend: DTALEN (Data Length) DTYP (Descriptor Type) DCMD (Descriptor Command) STA (Status) RSV (Reserved) POPTS (Packet Options) VLAN (VLAN tag)
The DCMD byte 7 6 5 4 3 2 1 0 IDE VLE DEXT (=1) reserved (=0) RS TSE IFCS EOP Legend: IDE (Interrupt Delay Enable) VLE (VLAN Enable) DEXT (Descriptor Extension) RS (Report Status) TSE (TCP-Segmentation Enable) IFCS (Insert Frame CheckSum) EOP (End Of Packet)) always valid valid only when EOP=1
Our usage example • We’ve created a module named ‘offload.c’ which demonstrates the NIC’s checksum-offloading capability for TCP/IP packets • It’s a modification of our earlier ‘nictcp.c’ character-mode device-driver module • We have excerpted the main changes in a class-handout – the full version is online
Data-type definitions // Our type-definition for the ‘Type 0’ Context-Descriptor typedef struct { unsigned char ipcss; unsigned char ipcso; unsigned short ipcse; unsigned char tucss; unsigned char tucso; unsigned short tucse; unsigned int paylen:20; unsigned int dtyp:4; unsigned int tucmd:8; unsigned char status; unsigned char hdrlen; unsigned short mss; } TX_CONTEXT_OFFLOAD;
Definitions (continued) // Our type-definition for the ‘Type 1’ Context-Descriptor typedef struct { unsigned long long base_addr; unsigned int dtalen:20; unsigned int dtyp:4; unsigned int dcmd:8; unsigned char status; unsigned char pkt_opts; unsigned short vlan_tag; } TX_CONTEXT_DATA; typedef union { TX_CONTEXT_OFFLOAD off; TX_CONTEXT_DATA dat; } TX_DESCRIPTOR;
Our packets’ layout Ethernet Header (14 bytes) 14 bytes IP Header (20 bytes) HDR CKSUM (no options) 10 bytes TCP Header (20 bytes) TCP CKSUM (no options) 16 bytes Packet-Data (length varies)
How we use contexts • Our ‘offload.c’ driver will send a ‘Type 0’ Context Descriptor within ‘module_init()’ txring[ 0 ].off.ipcss = 14; // IP-header CheckSum Start txring[ 0 ].off.ipcso = 24; // IP-header CheckSum Offset txring[ 0 ].off.ipcse = 34; // IP-header CheckSum Ending txring[ 0 ].off.tucss = 34; // TCP/UDP-segment CheckSum Start txring[ 0 ].off.tucso = 50; // TCP/UDP-segment Checksum Offset txring[ 0 ].off.tucse = 0; // TCP/UDP-segment Checksum Ending txring[ 0 ].dtyp = 0; // Type 0 Context Descriptor txring[ 0 ].tucmd = (1<<5)|(1<<3); // DEXT=1, RS=1 iowrite32( 1, io + E1000_TDT ); // give ownership to NIC
Using contexts (continued) • Our ‘offload.c’ driver will then use a Type 1 context descriptor every time its ‘write()’ function is called to transmit user-data • The network controller ‘remembers’ the checksum-offloading parameters that we sent during module-initialization, and so it continues to apply them to every outgoing packet (we keep our same packet-layout)
Sequence of ‘write()’ steps • Adjust the ‘len’ argument (if necessary) • Copy ‘len’ bytes from the user’s ‘buf’ array • Prepend the packet’s TCP Header • Insert the pseudo-header’s checksum • Prepend the packet’s IP Header • Prepend the packet’s Ethernet Header • Initialize the Data-Context Tx-Descriptor • Give descriptor-ownership to the NIC
The TCP pseudo-header • We do initialize the TCP Checksum field, (but this only needs a short computation) • The one’s complement sum of these six words is placed into ‘TCP Checksum’ Zero Protocol-ID (= 6) TCP Segment-length Source IP-address Destination IP-address
Setting up the Type-1 Context int txtail = ioread32( io + E1000_TDT ); txring[ txtail ].dat.base_addr = tx_desc + (txtail * TX_BUFSIZ); txring[ txtail ].dat.dtalen = 54 + len; txring[ txtail ].dat.dtyp = 1; txring[ txtail ].dat.dcmd = 0; txring[ txtail ].dat.status = 0; txring[ txtail ].dat.pkt_opts = 3; // IXSM=1, TXSM=1 txring[ txtail ].dat.vlan_tag = vlan_id; txring[ txtail ].dat.dcmd |= (1<<0); // EOP (End-Of-Packet) txring[ txtail ].dat.dcmd |= (1<<3); // RS (Report Status) txring[ txtail ].dat.dcmd |= (1<<5); // DEXT (Descriptor Extension) txring[ txtail ].dat.dcmd |= (1<<6); // VLE (VLAN Enable) txtail = (1 + txtail) % N_TX_DESC; iowrite32( txtail, io + E1000_TDT );
In-class demonstration • We can demonstrate checksum-offloading by using our ‘dram.c’ device-driver to look at the packet that is being transmitted from one of our ‘anchor’ machines, and to look at the packet that gets received by another ‘anchor’ machine • The checksum-fields (at offsets 24 and 50) do get modified by the network hardware!
In-class exercise • The NIC can also deal with packets having the UDP protocol-format – but you need to employ different parameters in the Type 0 Context Descriptor and arrange a ‘header’ for the UDP segment that has a different length and arrangement of parameters • Also the UDP protocol-ID is 17 (=0x11)
UDP Header UDP header: Traditional ‘Big-Endian’ representation