230 likes | 235 Views
Our ‘xmit1000.c’ driver. Implementing a ‘packet-transmit’ capability with the Intel 82573L network interface controller. Remenber ‘echo’ and ‘cat’?.
E N D
Our ‘xmit1000.c’ driver Implementing a ‘packet-transmit’ capability with the Intel 82573L network interface controller
Remenber ‘echo’ and ‘cat’? • Your device-driver module (named ‘uart.c’) was supposed to allow two programs that are running on a pair of adjacent PCs to communicate via a “null-modem” cable Transmitting… Receiving… $ echo Hello > /dev/uart $ _ $ cat /dev/uart Hello _
‘keep it simple’ • Let’s try to implement a ‘write()’ routine for our Intel Pro/1000 ethernet controllers that will provide the same basic functionality as we achieved with our serial UART driver • It should allow us to transmit a message by using the familiar UNIX ‘cat’ command to redirect output to a character device-file • Our device-file will be named ‘/dev/nic’
Driver’s components my_fops write my_write() This function will program the actual data-transfer ‘struct’ holds one function-pointer my_get_info() This function will allow us to inspect the transmit-descriptors module_init() module_exit() This function will detect and configure the hardware, define page-mappings, allocate and initialize the descriptors, start the ‘transmit’ engine, create the pseudo-file and register ‘my_fops’ This function will do needed ‘cleanup’ when it’s time to unload our driver – turn off the ‘transmit’ engine, free the memory, delete page-table entries, the pseudo-file, and the ‘my_fops’
Kzalloc() • Linux kernels since 2.6.13 offer this convenient function for allocating pre-zeroed kernel memory • It has the same syntax as the ‘kmalloc()’ function (described in our texts), but adds the after-effect of zeroing out the newly-allocated memory-area • Thus it does two logically distinct actions (often coupled anyway) within a single function-call void *kmem = kmalloc( region_size, GFP_KERNEL ); memset( kmem, 0x00, region_size ); /* can be replaced with */ void *kmem = kzalloc( region_size, GFP_KERNEL );
Single page-frame option Packet-Buffer (3-KB) (reused for successive transmissions) 4KB Page- Frame Descriptor-Buffer (1-KB) (room for up to 256 descriptors)
Our Tx-Descriptor ring After writing the data into our packet-buffer, and writing its length to the the current TAIL descriptor, our driver will advance the TAIL index; the NIC responds by reading the current HEAD descriptor, fetching its data, then advancing the HEAD index as it sends our data out over the wire. TAIL HEAD descriptor 0 Our ‘reusable’ transmit-buffer (1536 bytes) descriptor 1 descriptor 2 descriptor 3 descriptor 4 descriptor 5 descriptor 6 descriptor 7 one packet-buffer Array of 8 transmit-descriptors
‘/proc/xmit1000’ • This pseudo-file can be examined anytime to find out what values (if any) the NIC has ‘written back’ into the transmit-descriptors (i.e., the descriptor-status information) and current values in registers TDH and TDT: $ cat /proc/xmit1000
Direct Memory Access • The NIC is able to ‘fetch’ descriptors from host-system’s memory (and also can read the data from our packet-buffer) as well as ‘store’ a status-report back into the host’s memory by temporarily becoming the BusMaster (taking control of the system-bus away from the CPU so that it can perform the ‘fetch’ and ‘store’ operations directly, without CPU involvement or interference)
Configuration registers CTRL Device Control CTRL_EXT Extended Device Control TIPG Transmit Inter-Packet Gap TCTL Transmit Control TDBAL Transmit Descriptor-queue Base-Address (LOW) TDBAH Transmit Descriptor-queue Base-Address (HIGH) TDLEN Transmit Descriptor-queue Length TDH Transmit Descriptor-queue HEAD TDT Transmit Descriptor-queue TAIL TXDCTL Transmit Descriptor-queue Control
The ‘initialization’ sequence • Detect the network interface controller • Obtain its i/o-memory address and size • Remap the i/o-memory into kernel-space • Allocate memory for buffer and descriptors • Initialize the array of transmit-descriptors • Reset the NIC and configure its operations • Create the ‘/proc/xmit1000’ pseudo-file • Register our ‘write()’ driver-method
The ‘cleanup’ sequence • Usually the steps here follow those in the initialization sequence -- but in backwards order: • Unregister the device-driver’s file-operations • Delete the ‘/proc/xmit1000’ pseudo-file • Disable the NIC’s ‘transmit’ engine • Release the allocated kernel-memory • Unmap the NIC’s i/o-memory region
Our ‘write()’ algorithm • Get index of the current TAIL descriptor • Confine the amount of user-data • Copy user-data into the packet-buffer • Setup the packet’s Ethernet Header • Setup packet-length in the TAIL descriptor • Now hand over this descriptor to the NIC (by advancing the value in register TDT) • Tell the kernel how many bytes were sent
Recall Tx-Descriptor Layout 31 0 Buffer-Address low (bits 31..0) 0x0 0x4 0x8 0xC Buffer-Address high (bits 63..32) CMD CSO Packet Length (in bytes) special CSS reserved =0 status Buffer-Address = the packet-buffer’s 64-bit address in physical memory Packet-Length = number of bytes in the data-packet to be transmitted CMD = Command-field CSO/CSS = Checksum Offset/Start (in bytes) STA = Status-field
Suggested C syntax typedef struct { unsigned long long base_addr; unsigned short pkt_length; unsigned char cksum_off; unsigned char desc_cmd; unsigned char desc_stat; unsigned char cksum_org; unsigned short special; } TX_DESCRIPTOR;
Transmit IPG (0x0410) IPG = Inter-Packet Gap 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 IPG R =0 IPG After Deferral (Recommended value = 7) IPG Part 1 (Recommended value = 8) IPG Back-To-Back (Recommended value = 8) This register controls the Inter-Packet Gap timer for the Ethernet controller. Note that the recommended TIPG register-value to achieve IEEE 802.3 compliant minimum transfer IPG values in full- and half-duplex operations would be 00702008 (hexadecimal), equal to (7<<20) | (8<<10) | (8<<0). 82573L
Transmit Control (0x0400) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 R =0 R =0 R =0 MULR TXCSCMT UNO RTX RTLC R =0 SW XOFF COLD (upper 6-bits) (COLLISION DISTANCE) 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 COLD (lower 4-bits) (COLLISION DISTANCE) CT (COLLISION THRESHOLD) 0 ASDV I L O S SPEED S L U TBI mode P S P 0 0 R =0 E N R =0 EN = Transmit Enable SWXOFF = Software XOFF Transmission PSP = Pad Short Packets RLTC = Retransmit on Late Collision CT = Collision Threshold (=0xF) UNORTX = Underrun No Re-Transmit COLD = Collision Distance (=0x3F) TXCSCMT = TxDescriptor Minimum Threshold MULR = Multiple Request Support 82573L
Our driver’s elections Here’s a C programming style that ‘documents’ the programmer’s choices. int tx_control = 0; tx_control |= (0<<1); // EN-bit (Enable Transmit Engine) tx_control |= (1<<3); // PSP-bit (Pad Short Packets) tx_control |= (15<<4); // CT=15 (Collision Threshold) tx_control |= (63<<12); // COLD=63 (Collision Distance) tx_control |= (0<<22); // SWXOFF-bit (Software XOFF Tx) tx_control |= (1<<24); // RTLC-bit (Re-Transmit on Late Collision) tx_control |= (0<<25); // UNORTX-bit (Underrun No Re-Transmit) tx_control |= (0<<26); // TXCSMT=0 (Tx-descriptor Min Threshold) tx_control |= (0<<28); // MULR-bit (Multiple Request Support) iowrite32( tx_control, io + E1000_TCTL ); // Transmit Control register 82573L
An ‘e1000.c’ anomaly? • The official Linux kernel is delivered with a device-driver supporting Intel’s ‘Pro/1000’ gigabit ethernet controllers (several) • Often this driver will get loaded by default during the system’s startup procedures • But it will interfere with your own driver if you try to write a substitute for ‘e1000.ko’ • So you will want to remove it with ‘rmmod’
Side-effect of ‘rmmod’ • We’ve observed an unexpected side-effect of ‘unloading’ the ‘e1000.ko’ device-driver • The PCI Configuration Space’s command register gets modified in a way that keeps the NIC from working with your own driver • Specifically, the Bus Mastering capability gets disabled (by clearing bit #2 in the PCI Configuration Space’s word at address 4)
What to do about it? • This effect doesn’t arise on our ‘anchor’ cluster machines, but you may encounter it when you try using our demo elsewhere • Here’s the simple “fix” to turn Bus Master capability back on (in your ‘module_init()’) u16 pci_cmd; // declares a 16-bit variable pci_read_config_word( devp, 4, &pci_cmd ); // read current word pci_cmd |= (1<<2); // turn on the Bus Master enabled-bit pci_write_config_word( devp, 4, pci_cmd ); // write modification
In-class demo • We demonstrate our ‘xmit1000.c’ driver on an ‘anchor’ machine, with some help from a companion-module (named ‘recv1000.c’) which is soon-to-be discussed in class Transmitting… Receiving… $ echo Hello > /dev/nic $ _ $ cat /dev/nic Hello _ anchor01 anchor05 LAN
In-class exercise • Open three or more terminal-windows on your PC’s graphical desktop, and login to a different ‘anchor’ machine in each one • Install the ‘xmit1000.ko’ module on one of the anchor machines, and then install our ‘recv1000.ko’ module on the other stations • Execute the ‘cat /dev/nic’ command on the receiver-stations, and then run an ‘echo’ command on the transmitter-station