250 likes | 330 Views
82573L. Initializing our Pro/1000. Chicken-and-Egg?. We want to create a Linux Kernel Module that can serve application-programs as a character-mode device-driver for our NIC So, as with the UART device, we will need to implement ‘read()’ and ‘write()’ methods
E N D
82573L Initializing our Pro/1000
Chicken-and-Egg? • We want to create a Linux Kernel Module that can serve application-programs as a character-mode device-driver for our NIC • So, as with the UART device, we will need to implement ‘read()’ and ‘write()’ methods • But which method should we do first? • No way to “test” a ‘read()’ method without having a way to send packets to our NIC
How ‘transmit’ works Buffer0 List of Buffer-Descriptors descriptor0 descriptor1 Buffer1 descriptor2 descriptor3 0 0 0 Buffer2 0 We setup each data-packets that we want to be transmitted in a ‘Buffer’ area in ram We also create a list of buffer-descriptors and inform the NIC of its location and size Then, when ready, we tell the NIC to ‘Go!’ (i.e., start transmitting), but let us know when these transmissions are ‘Done’ Buffer3 Random Access Memory
Registers’ Names Memory-information registers TDBA(L/H) = Transmit-Descriptor Base-Address Low/High (64-bits) TDLEN = Transmit-Descriptor array Length TDH = Transmit-Descriptor Head TDT = Transmit-Descriptor Tail Transmit-engine control registers TXDCTL = Transmit-Descriptor Control Register TCTL = Transmit Control Register Notification timing registers TIDV = Transmit Interrupt Delay Value TADV = Transmit-interrupt Absolute Delay Value
Tx-Desc Ring-Buffer TDBA base-address 0x00 0x10 0x20 0x30 0x40 0x50 0x60 0x70 0x80 TDH (head) TDLEN (in bytes) TDT (tail) = owned by hardware (nic) = owned by software (cpu) Circular buffer (128-bytes minimum)
Tx-Descriptor Control (0x3828) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0 0 0 0 0 0 0 G R A N 0 0 WTHRESH (Writeback Threshold) 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 0 HTHRESH (Host Threshold) 0 FRC DPLX FRC SPD 0 0 0 0 I L O S 0 A S D E PTHRESH (Prefetch Threshold) 0 L R S T 0 0 0 0 “This register controls the fetching and write back of transmit descriptors. The three threshhold values are used to determine when descriptors are read from, and written to, host memory. Their values can be in units of cache lines or of descriptors (each descriptor is 16 bytes), based on the value of the GRAN bit (0=cache lines, 1=descriptors). When GRAN = 1, all descriptors are written back (even if not requested).” --Intel manual Recommended for 82573: 0x01010000 (GRAN=1, WTHRESH=1)
Transmit Control (0x0400) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 R =0 R =0 R =0 MULR TXCSCMT UNO RTX RTLC R =0 SW XOFF COLD (upper 6-bits) (COLLISION DISTANCE) 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 COLD (lower 4-bits) (COLLISION DISTANCE) CT (COLLISION THRESHOLD) 0 ASDV I L O S SPEED S L U TBI mode P S P 0 0 R =0 E N R =0 EN = Transmit Enable SWXOFF = Software XOFF Transmission PSP = Pad Short Packets RLTC = Retransmit on Late Collision CT = Collision Threshold (=0xF) UNORTX = Underrun No Re-Transmit COLD = Collision Distance (=0x3F) TXCSCMT = TxDescriptor Minimum Threshold MULR = Multiple Request Support 82573L
Tx Configuration Word (0x0178) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 ANE Tx Config Reserved (=0) ITCE R =0 IAME R =0 DF PAR EN PB PAR EN Tx LS Tx LS Flow =0 R =0 Phy Pwr Down En DMA Dyn GE R =0 RO DIS 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 SPD BYPS TxConfigWord R =0 EE RST ASD CHK R =0 R =0 R =0 R =0 R =0 R =0 R =0 R =0 0 0 ANE = Auto-Negotiation Enable TxConfig = Transmit Configuration Control bit TxConfigWord = Transmit Configuration Word This register has two meanings, depending on the state of the ANE bit (i.e., setting ANE=1 enables the hardware auto-negotiation machine). Applicable only in SerDes mode; program as 0 for internal-PHY mode. 82573L
Legacy Tx-Descriptor Layout 31 0 Buffer-Address low (bits 31..0) 0x0 0x4 0x8 0xC Buffer-Address high (bits 63..32) CMD CSO Packet Length (in bytes) special CSS reserved =0 status Buffer-Address = the packet-buffer’s 64-bit address in physical memory Packet-Length = number of bytes in the data-packet to be transmitted CMD = Command-field CSO/CSS = Checksum Offset/Start (in bytes) STA = Status-field
Suggested C syntax typedef struct { unsigned long long base_addr; unsigned short pkt_length; unsigned char cksum_off; unsigned char desc_cmd; unsigned char desc_stat; unsigned char cksum_org; unsigned short special; } tx_descriptor;
TxDesc Command-field 7 6 5 4 3 2 1 0 IDE VLE DEXT reserved =0 RS IC IFCS EOP EOP = End Of Packet (1=yes, 0=no) IFCS = Insert Frame CheckSum (1=yes, 0=no) – provided EOP is set IC = Insert CheckSum (1=yes, 0=no) as indicated by CSO/CSS fields RS = Report Status (1=yes, 0=no) DEXT = Descriptor Extension (1=yes, 0=no) use ‘0’ for Legacy-Mode VLE = VLAN-Packet Enable (1=yes, 0=no) – provided EOP is set IDE = Interrupt-Delay Enable (1=yes, 0=no)
TxDesc Status field 3 2 1 0 reserved =0 LC EC DD DD = Descriptor Done this bit is written back after the NIC processes the descriptor provided the descriptor’s RS-bit was set (i.e., Report Status) EC = Excess Collisions indicates that the packet has experienced more than the maximum number of excessive collisions (as defined by the TCTL.CT field) and therefore was not transmitted. (This bit is meaningful only in HALF-DUPLEX mode.) LC = Late Collision indicates that Late Collision has occurred while operating in HALF-DUPLEX mode. Note that the collision window size is dependent on the SPEED: 64-bytes for 10/100-MBps, or 512-bytes for 1000-Mbps.
Bit-mask definitions enum { DD = (1<<0), // Descriptor Done EC = (1<<1), // Excess Collisions LC = (1<<2), // Late Collision EOP = (1<<0), // End Of Packet IFCS = (1<<1), // Insert Frame CheckSum IC = (1<<2), // Insert CheckSum as per CSO/CSS RS = (1<<3), // Report Status DEXT = (1<<5), // Descriptor Extension VLE = (1<<6), // VLAN packet IDE = (1<<7) // Interrupt-Delay Enable };
Allocating kernel-memory • Our 82573L device-driver will need to use a segment of contiguous physical memory which is cache-aligned and non-pageable • As explained in our LDD3 textbook, such a memory-block can be allocated using the Linux kernel’s ‘kmalloc()’ function (and it can later be deallocated using ‘kfree()’) • The maximum-size allocation is 128-KB • You should use the ‘GFP_KERNEL’ flag
Network MTU • Unless the ‘Large-Send’ functionality has been enabled, there will be a maximum length for your network ‘datagrams’ equal to 1536 bytes (=0x0600) • So if you reused the same Packet-Buffer for successive transmissions, you could fit your packet-buffer and a moderate-sized Descriptor-Buffer into one 4KB-pageframe
Single page-frame option Descriptor-Buffer (1-KB) (room for up to 256 descriptors) Packet-Buffer (3-KB) (reused for successive transmissions) 4KB Page- Frame
Another design-option… Descriptor-Buffer (128-bytes) (room for 16 descriptors) 16 Packet-Buffers (3968-bytes) (248-bytes per buffer ) 4KB Page- Frame
Initialization • Your device-driver needs to initialize your 82573L hardware to a known state, and configure its options for your desired mode of operation • The Device Control register has bits which let you initiate a ‘device reset’ operation • The Device Status register has bits which inform you when a ‘reset’ has completed
Device Status (0x0008) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 ? 0 0 0 0 0 0 0 0 0 0 0 GIO Master EN 0 0 0 some undocumented functionality? 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 0 0 0 0 PHY reset ASDV I L O S SPEED S L U 0 TX OFF Function ID 0 0 L U F D FD = Full-Duplex LU = Link Up TXOFF = Transmission Paused SPEED (00=10Mbps,01=100Mbps, 10=1000Mbps, 11=reserved) ASDV = Auto-negotiation Speed Detection Value 82573L
Device Control (0x0000) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 PHY RST VME R =0 TFCE RFCE RST R =0 R =0 R =0 R =0 R =0 ADV D3 WUC R =0 D/UD status R =0 R =0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 R =0 R =0 R =0 FRC DPLX FRC SPD R =0 SPEED R =0 S L U R =0 R =0 R =1 0 0 GIO M D R =0 F D FD = Full-Duplex SPEED (00=10Mbps, 01=100Mbps, 10=1000Mbps, 11=reserved) GIOMD = GIO Master Disable ADVD3WUP = Advertise Cold Wake Up Capability SLU = Set Link Up D/UD = Dock/Undock status RFCE = Rx Flow-Control Enable FRCSPD = Force Speed RST = Device Reset TFCE = Tx Flow-Control Enable FRCDPLX = Force Duplex PHYRST = Phy Reset VME = VLAN Mode Enable 82573L
Extended Control (0x0018) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 R =0 R =0 ? ITCE R =0 IAME R =0 DF PAR EN PB PAR EN Tx LS Tx LS Flow =0 R =0 Phy Pwr Down En DMA Dyn GE R =0 RO DIS R =0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 SPD BYPS R =0 EE RST ASD CHK R =0 R =0 R =0 R =0 R =0 R =0 R =0 R =0 R =0 0 0 R =0 R =0 R =0 ASDCHK = AutoSpeed Detection Check TxLSFlow = Tx Large-Send Flow EERST = EEPROM Reset TxLS = Tx Large-Send functionality SPDBYPS = Speed-selection Bypass PBPAREN = Packet-Buffer Parity-Error Detect RODIS = Relaxed-Ordering Disable DFPAREN = Descriptor-FIFO Parity-Error Detect DMADynGE = DMA Dynamic-Gating Enable IAME = Interrupt-Acknowledge Auto-Mask Enable PhyPwrDownEn = Phy PowerDown Enable ITCE = Interrupt Timers Cleared Enable 82573L
Example // clear STATUS bit #31 iowrite32( 0x00000000, io + E1000_STATUS ); // initiate Device-Reset and Phy-Reset iowrite32( 0x84000000, io + E1000_CTRL ); // wait until STATUS bit #31 is set while ( ( ioread32( io + E1000_STATUS )&(1<<31)) == 0 ); // program Link Up with desired operating-mode settings iowrite32( 0x00040241, io + E1000_CTRL ); // wait until LU-bit in STATUS is set while ( ( ioread32( io + E1000_STATUS )&(1<<10)) == 0 );
Interrupt Cause Read (0x00C0) Mechanism for NIC-event notifications 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 INT assert R =0 R =0 R =0 R =0 R =0 R =0 R =0 R =0 R =0 R =0 R =0 R =0 R =0 A C K S R P D 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 TXD LOW R =0 R =0 R =0 R =0 R =0 MDAC R =0 RXT0 RXO R =0 RXD MT0 R =0 L S C 0 0 T X Q E T X D W TXDW = Transmit Descriptor Written back LSC = Link Status Changed TXQE = Transmit Queue Empty MDAC = MDI/O Access Completed SRPD = Small Receive Packet Detected ACK = Receive ACK-frame detected RXT0 = Receiver Timer Interrupt RXO = Receiver Overrun TXDLOW = Transmit Descriptor Low Threshhold Reached RXDMT0 = Receive Descriptor Minimum Threshhold Reached INT-Assert = Interrupt Assertion is still pending
In-Class Exercise #1 • Try compiling and installing our ‘tryreset.c’ demo-module, and examine the messages put in the kernel’s log-file (use ‘dmesg’) • Then modify the module-code so that it also outputs the value in the ICR register (Interrupt Cause Read) during each pass through the two ‘busy-waiting’ loops • #define E1000_ICR 0x00C0
In-Class Exercise #2 • Apply the save techniques we employed in our earlier ‘announce.c’ demo-module so that the ‘printk()’ statements in ‘tryreset.c’ get replaced by statements that will show the messages onscreen, or in the current desktop window, rather than writing them to the kernel’s (out-of-view) log-file