250 likes | 431 Views
What’s needed to receive?. A look at the minimum steps required for programming our 82573L nic to receive packets. Accessing 82573L registers. Device registers are hardware mapped to a range of addresses in physical memory
E N D
What’s needed to receive? A look at the minimum steps required for programming our 82573L nic to receive packets
Accessing 82573L registers • Device registers are hardware mapped to a range of addresses in physical memory • We can get the location and extent of this memory-range from a BAR register in the 82573L device’s PCI Configuration Space • We then request the Linux kernel to setup an I/O ‘remapping’ of this memory-range to ‘virtual’ addresses within kernel-space
Linux address-spaces kernel space nic registers 128-TB kernel code/data user space stack 64-GB 128-TB dynamic ram nic registers dynamic ram shared libraries .text, .data, .bss physical address-space ‘virtual’ address-space
Kernel memory allocation • The NIC requires that some host memory for packet-buffers and receive descriptors • The kernel provides a ‘helper function’ for reserving a suitable region of memory in kernel-space which is both ‘non-pageable’ and ‘physically contiguous’ (i.e., kzalloc()) • It’s our job is to decide how much memory our network controller hardware will need
Format for an Rx Descriptor 16 bytes Base-address (64-bits) Packet- length Packet- checksum status errors VLAN tag The device-driver initializes this ‘base-address’ field with the physical address of a packet-buffer The network controller will ‘write-back’ the values for these fields when it has transferred a received packet’s data into this descriptor’s packet-buffer
Suggested C syntax typedef struct { unsigned long long base_address; unsigned short packet_length; unsigned short packet_cksum; unsigned char desc_status; unsigned char desc_errors; unsigned short vlan_tag; } RX_DESCRIPTOR; ‘Legacy Format’ for the Intel Pro1000 network controller’s Receive Descriptors
Ethernet packet layout • Total size normally can vary from 64 bytes up to 1522 bytes (unless ‘jumbo’ packets and/or ‘undersized’ packets are enabled) • The NIC expects a 14-byte packet ‘header’ and it appends a 4-byte CRC check-sum 0 6 12 14 the packet’s data ‘payload’ goes here (usually varies from 56 to 1500 bytes) destination MAC address (6-bytes) source MAC address (6-bytes) Type/length (2-bytes) Cyclic Redundancy Checksum (4-bytes)
Rx-Descriptor Ring-Buffer RDBA base-address 0x00 0x10 0x20 0x30 0x40 0x50 0x60 0x70 0x80 RDH (head) RDLEN (in bytes) RDT (tail) = owned by hardware (nic) = owned by software (cpu) Circular buffer (128-bytes minimum – and must be a multiple of 128 bytes)
Packet-buffers and descriptors • Our ‘nicrx.c’ module allocates 8 buffers of size 2K-bytes (i.e., more than enough for any normal Ethernet packets) for the Rx Descriptor Queue (128 bytes) for the eight packet-buffers 16K + 128 bytes allocated (8 packet-buffers, plus Rx-Descriptor Queue)
RxDesc Status-field 7 6 5 4 3 2 1 0 PIF IPCS TCPCS UDPCS VP IXSM EOP DD DD = Descriptor Done (1=yes, 0=no) shows if nic is finished with descriptor EOP = End Of Packet (1=yes, 0=no) shows if this packet is logically last IXSM = Ignore Checksum Indications (1=yes, 0=no) VP = VLAN Packet match (1=yes, 0=no) USPCS = UDP Checksum calculated in packet (1=yes, 0=no) TCPCS = TCP Checksum calculated in packet (1=yes, 0=no) IPCS = IPv4 Checksum calculated on packet (1=yes, 0=no) PIF = Passed In exact Filter (1=yes, 0=no) shows if software must check
RxDesc Error-field 7 6 5 4 3 2 1 0 RXE IPE TCPE reserved =0 reserved =0 SEQ SE CE RXE = Received-data Error (1=yes, 0=no) IPE = IPv4-checksum error TCPE = TCP/UDP checksum error (1=yes, 0=no) SEQ = Sequence error (1=yes, 0=no) SE = Symbol Error (1=yes, 0=no) CE = CRC Error or alignment error (1=yes, 0=no)
Essential ‘receive’ registers enum { E1000_CTRL 0x0000, // Device Control E1000_STATUS 0x0008, // Device Status E1000_RCRL 0x0100, // Receive Control E1000_RDBAL 0x2800, // Rx Descriptor Base Address Low E1000_RDBAH 0x2804, // Rx Descriptor Base Address High E1000_RDLEN 0x2808, // Rx Descriptor Length E1000_RDH 0x2810, // Rx Descriptor Head E1000_RDT 0X2818, // Rx Descriptor Tail E1000_RXDCTL 0x2828, // Rx Descriptor Control E1000_RA 0x5400, // Receive address-filter Array };
Programming steps • Detect the presence of the 82573L network controller (VENDOR_ID, DEVICE_ID) • Obtain the physical address-range where the nic’s device-registers are mapped • Ask the kernel to map this address range into the kernel’s virtual address-space • Copy the network controller’s MAC-address into a 6-byte array for future access • Allocate a block of kernel memory large enough for our descriptors and buffers • Insure that the network controller’s ‘Bus Master’ capability has been enabled • Select our desired configuration-options for the DEVICE CONTROL register • Perform a nic ‘reset’ operation (by toggling bit 26), then delay until reset completes • Select our desired configuration-options for the RECEIVE CONTROL register • Initialize our array of Receive Descriptors with the physical addresses of buffers • Initialize the Receive Engine’s registers (for Rx-Descriptor Queue and Control) • Give ‘ownership’ of all of our Rx-Descriptors to the network controller • Enable the Receive Engine • Install our ‘/proc/nicrx’ pseudo-file (for user-diagnostic purposes) • NOTE: Steps 1) through 8) are the same as for our ‘nictx.c’ kernel module.
Device Control (0x0000) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 PHY RST VME R =0 TFCE RFCE RST R =0 R =0 R =0 R =0 R =0 ADV D3 WUC R =0 D/UD status R =0 R =0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 R =0 R =0 R =0 FRC DPLX FRC SPD R =0 SPEED R =0 S L U R =0 R =0 R =1 GIO M D 0 0 R =0 F D FD = Full-Duplex SPEED (00=10Mbps, 01=100Mbps, 10=1000Mbps, 11=reserved) GIOMD = GIO Master Disable ADVD3WUP = Advertise Cold Wake Up Capability SLU = Set Link Up D/UD = Dock/Undock status RFCE = Rx Flow-Control Enable FRCSPD = Force Speed RST = Device Reset TFCE = Tx Flow-Control Enable FRCDPLX = Force Duplex PHYRST = Phy Reset VME = VLAN Mode Enable We used 0x04000A49 to initiate a ‘device reset’ operation 82573L
Device Status (0x0008) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 ? 0 0 0 0 0 0 0 0 0 0 0 GIO Master EN 0 0 0 some undocumented functionality? 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 0 0 0 0 PHY RA ASDV I L O S SPEED S L U 0 TX OFF Function ID 0 0 L U F D FD = Full-Duplex LU = Link Up TXOFF = Transmission Paused SPEED (00=10Mbps,01=100Mbps, 10=1000Mbps, 11=reserved) ASDV = Auto-negotiation Speed Detection Value PHYRA = PHY Reset Asserted 82573L
Receive Control (0x0100) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 R =0 0 FLXBUF 0 SE CRC BSEX R =0 PMCF DPF R =0 CFI CFI EN VFE BSIZE 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 B A M R =0 MO DTYP RDMTS I L O S LBM S L U LPE MPE UPE 0 0 SBP E N R =0 EN = Receive Enable DTYP = Descriptor Type DPF = Discard Pause Frames SBP = Store Bad Packets MO = Multicast Offset PMCF = Pass MAC Control Frames UPE = Unicast Promiscuous Enable BAM = Broadcast Accept Mode BSEX = Buffer Size Extension MPE = Multicast Promiscuous Enable BSIZE = Receive Buffer Size SECRC = Strip Ethernet CRC LPE = Long Packet reception Enable VFE = VLAN Filter Enable FLXBUF = Flexible Buffer size LBM = Loopback Mode CFIEN = Canonical Form Indicator Enable RDMTS = Rx-Descriptor Minimum Threshold Size CFI = Canonical Form Indicator bit-value We used 0x1440821C in RCTL to prepare the ‘receive engine’ prior to enabling it
Rx-Descriptor Control (0x2828) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0 0 0 0 0 0 0 G R A N 0 0 WTHRESH (Writeback Threshold) 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 0 HTHRESH (Host Threshold) 0 FRC DPLX FRC SPD 0 0 0 0 I L O S 0 A S D E PTHRESH (Prefetch Threshold) 0 L R S T 0 0 0 0 “This register controls the fetching and write back of receive descriptors. The three threshold values are used to determine when descriptors are read from, and written to, host memory. Their values can be in units of cache lines or of descriptors (each descriptor is 16 bytes), based on the value of the GRAN bit (0=cache lines, 1=descriptors). When GRAN = 1, all descriptors are written back (even if not requested).” --Intel manual Recommended for 82573: 0x01010000 (GRAN=1, WTHRESH=1)
PCI Bus Master DMA 82573L i/o-memory Host’s Dynamic Random Access Memory on-chip RX descriptors packet-buffer on-chip TX descriptors packet-buffer Descriptor Queue packet-buffer DMA packet-buffer RX and TX FIFOs (32-KB total) packet-buffer packet-buffer packet-buffer
Pthresh and Hthresh • When the number of unprocessed descriptors in the NIC’s on-chip memory has fallen below the Prefetch Threshold, and the number of valid descriptors in host memory which are owned by the NIC is at least equal to the Host Threshold, then the NIC will fetch that number of descriptors in a single ‘burst’ DMA-transfer
Wthresh • When the number of descriptors waiting in the NIC’s on-chip memory to be written back to Host memory is at least equal to the Writeback Thrershold, then the NIC will write back that number of descriptors in a single ‘burst’ DMA-transfer
Experiment #1 • Let’s install our ‘nicrx.c’ kernel module on one host, and use the ‘cat’ command to view its queue of Rx-Descriptors: $ /sbin/insmod nicrx.ko $ cat /proc/nicrx • Then let’s install our ‘nictx.c’ module on a different host on the same local network: $ /sbin/insmod nictx.ko • Now look again at the receive descriptors!
Experiment #2 • Install our ‘dram.c’ device-driver module on both of these host-machines, and use our ‘fileview’ utility to look at the contents of each module’s packet-buffers – you’ll find their physical addresses displayed if you use ‘cat’ to see the descriptor-queues: $ cat /proc/nictx and $ cat /proc/nicrx
Experiment #3 • Our ‘nicrx.c’ module had enabled both the Unicast and Multicast promiscuous modes • So let’s watch what happens when we use the ‘/sbin/ifconfig’ command (with ‘sudo’) to bring up a secondary network interface on another host on the same segment of our local network • Do you recognize these new packets?
Experiment #4 • With ‘nicrx.c’ module installed on one host, log on to two other hosts on the same LAN and bring up their ‘eth1’ network interfaces • Use the ‘ping’ command on one of these two hosts to try contacting the other one • What do you observe about any packets that are received by the host where our ‘nicrx.c’ module had been installed?
In-class exercise • Suppose you turn off the UPE-bit (bit #3) in the Receive Control register (in nicrx.c) • From another host on the same segment, bring up its ‘eth1’ interface, then adjust its routing table so that all multicast packets are sent out via the secondary interface: $ sudo /sbin/route add –net 224.0.0.0 netmask 255.0.0.0 device eth1 • If you ‘ping’ a multicast address, will the ICMP datagram be received by ‘nicrx.c’?