350 likes | 702 Views
Network Driver in Linux 2.4. 潘仁義 CCU COMM. Overview. Auto Configuration. I/O access Byte ordering Address translation. Bus cycles. Bus. Direct Memory Access Power management. Operating System. Device. Driver framework Timer management Memory management
E N D
Network Driver in Linux 2.4 潘仁義 CCU COMM
Overview Auto Configuration I/O access Byte ordering Address translation Bus cycles Bus Direct Memory Access Power management Operating System Device Driver framework Timer management Memory management Race condition handling (SMP) CPU/Memory cache consistency Device operations Interrupt handling
Outline • Driver framework • Linux network drivers • Device operation • RTL8139 programming • Driver example • A piece of code for 93C46 series • EEPROM, 93C46 64 x 16 bits, 93C66 256 x 16 bits • pci_skeleton.c (for RTL8139)
Linux network driver frameworkConnecting to the Kernel (1/2) • Module_loading • struct net_device snull_dev = { init : snull_init, }; //初始化函式 • if((result = register_netdev(snull_dev)))) printk(“error”); • 呼叫前, 先設定name 為“eth%d”, 以便其配置 “ethX” • 函式內部會呼叫 devinit() • snull_init( ) • Probe function • Called when register_netdev() • Usually avoid registering I/O and IRQ, delay until devopen() time • To fill in the “dev” strcuture • ether_setup(dev) • 設定私有資料結構 “priv”; 網路介面生命期與系統一樣長, 可放統計資料 • Module_unloading • kfree(priv); • unregister_netdev (snull_dev);
Linux network driver frameworkConnecting to the Kernel (2/2) • struct net_device { • char name[IFNAMSIZ]; // eth%d • unsigned long base_addr, unsigned char irq; • unsigned char broadcast[], dev_addr[MAX_ADDR_LEN]; • unsigned short flags; // IFF_UP, IFF_PROMISC, IFF_ALLMULTI • Function pointers: • (*init) 初始化 • (*open) 開啟介面 • (*stop) 停用介面 • (*do_ioctl)() • (*tx_timeout) 逾時處理 • (*get_stats) 結算統計資訊 • (*hard_start_xmit) 送出封包 • (*set_multicast_list) 群播及flag變動處理 • unsigned long trans_start, last_rx; // for watchdog and power management • struct dev_mc_list *mc_list; // multicast address list
Linux network driver frameworkOpening and closing • 在介面傳輸封包之前,必須先以ifconfig開啟介面,並賦予IP位址 • ifconfig設定IP位址給介面時: • ioctl(SIOCSIFADDR)設定軟體位址給介面 • Ioctl(SIOCSIFILAGS)要求驅動程式開啟、關閉介面觸動open及stop • open() • 設法取得必要的系統資源(佔領IRQ, IObase, buffer) • 要求介面硬體起動 • 讀出MAC, 複製到 devdev_addr (也可作在init或probe時) • 將devdev_addr設定至介面MAC暫存器中 • stop() • 停止介面硬體 • 歸還系統資源
Linux network driver frameworkPacket transmission: 當核心需要送出資料封包時 • 將資料排入出境封包佇列(outgoing queue) • 呼叫作業方法 • hard_start_transmit(struct sk_buff *skb, struct net_device *dev) • 僅將封包交付網卡。網卡後續會再將封包傳送至網路(例如RTL8139) • Spinlock_t xmit_lock; 只有在返回後才有可能再被呼叫 • 實務上,於返回之後,網路卡仍忙著傳輸剛交付的封包。 • 網卡緩衝區小,滿了必須讓核心知道,不接收新的傳輸要求。netif_stop_queue()與netif_wake_queue(),netif_start_queue() • 註: 還有Carrier loss detection/Watchdog 的 netif_carrier_on/off()跟Hot-plugging/power management 的 netif_device_attach/detach() • 核心經手的每一封包,都是包裝成一個struct sk_buff • socket buffer • 指向sk_buff的指標,通常取名為skb • skbdata指向即將被送出的封包 • skblen是該封包的長度,單位是octet
If ( present && carrier_ok && queue_stopped && ( jiffies – trans_start ) > watchdog_timeo ) Then Call tx_timeout( ) 更新統計,並設定使能繼續送封包 Linux network driver frameworkTransmission queuing model Present? netif_device_attach() netif_device_detach() Packets go to the LAN Packets from OS Queue stopped ? netif_start_queue() netif_wake_queue() netif_stop_queue() Carrier ok ? netif_carrer_on() netif_carrer_off()
Linux network driver frameworkPacket reception • 封包接收事件通常是從網路硬體觸發中斷開始 • 多半寫在interrupt handler • 配置一個sk_buff,並交給核心內部的網路子系統 • Interrupt-based 較 polling方式有效率 • Example: snull_rx() • skb = dev_alloc_skb(len+2); // 採用GFP_ATOMIC,可在ISR中用 • skb_reserve(skb, 2); // 16 byte align the IP field • memcpy(skb_put(skb, len), receive_packet, len); //skb_put()參考sk_buff • 填寫相關資訊 • skbdev = dev; • skbprotocol = eth_type_trans(skb, dev); • skbip_summed = CHECKSUM_UNNECESSARY; /* 不必檢查 */ • CHECKSUM_HW(硬體算了)/NONE(待算,預設)/UNNECESSARY(不算) • netif_rx(skb); // 交給核心內部的網路子系統
Linux network driver frameworkThe interrupt handler • Interrupt happen when • A new packet has arrived • Transmission of an outgoing packet is completed • Something happened: PCI bus error, cable length change, time out • Interrupt status register (ISR) • Packet reception • Pass to the kernel • Packet transmission is completed • Reset the transmit buffer of the interface • Statistics
headroom payload tailroom An empty sk_buff Linux network driver frameworkThe socket buffers (struct sk_buff) head data tail end len struct sk_buff *dev_alloc_skb(len) 配置 void dev_kfree_skb(struct sk_buff *)釋放 void skb_reserve(skb, len)保留前頭空間 unsigned char *skb_put(skb, len)附加資料 unsigned char*skb_push(skb, len)前置資料 unsigned char *skb_pull(skb, len)前抽資料
Linux network driver frameworkSetup receive mode and multicast accept list • Unicast, broadcast (all 1), multicast (bit0==1) • Receive all, receive all multicast, receive a list of multicast address • Transmit • the same as unicast • Receive • Hardware filtering for a list of multicast addresses • void (*set_multicast_list)(dev) • 要接收的群播位址清單或是dev->flags有改變, 會被核心呼叫 • struct dev_mc_list *mc_list; // int mc_count • 串列所有dev必須接收的所有群播位址 • IFF_PROMISC • 設立則進入『混雜模式』(全收) • IFF_ALLMULTI • 收進所有群播封包
Outline • Driver framework • Linux network drivers • Device operation • RTL8139 programming • Driver example • A piece of code for 93C46 series • EEPROM, 93C46 64 x 16 bits, 93C66 256 x 16 bits • pci_skeleton.c (for RTL8139)
Device operationRTL8139(A/B) programming • Packet transmission • 4 transmit descriptors in round-robin • Transmit FIFO and Early Transmit • Packet reception • Ring buffer in a physical continuous memory • Receive FIFO and FIFO Threshold • Hardware initialization • Command register (0x37) • Reset (4) / Transmit Enable (2) / Receive Enable (3) / Buffer empty (0) • Transmit (Tx) Configuration Register (0x40~0x43) • Interframe Gap time (螃蟹卡) (25~24) • Receive (Rx) Configuration Register (0x44~0x47) • Rx FIFO threshold (15~13) • Accept Broadcast (3) / Multicast (2) / All (0, Promiscuous mode) packet • Rx buffer length (12~11) • Interrupt Mask Register (0x3C~0x3D) • Software initialization (TxDescriptor and Ring buffer)
RTL8139 Packet transmissionTransmit descriptor • Transmit start address (TSAD0-3) • The physical address of packet • The packet must be in a continuous physical memory • Transmit status(TSD0-3) • TOK(15R) • Set to 1 indicates packet transmission was completed successfully and no transmit underrun (14R) has occurred • OWN(13R/W) • Set to 1 when the Tx DMA operation of this descriptor was completed • The driver must set this bit to 0 when the “Size” is written • Size(12~0R/W) • The total size in bytes of the data in this descriptor • Early Tx Threshold(21~16R/W) • When the byte count in the Tx FIFO reaches this, the transmit happens. • From 000001 to 111111 in unit of 32 bytes (000000 = 8 bytes)
RTL8139 Packet transmissionProcess of transmitting a packet • copy the packet to a physically continuous buffer in memory • Write the functioning descriptor • Address, Size, Early transmit threshold, Clear OWN bit (this starts PCI operation) • As TxFIFO meet threshold, the chip start to move from FIFO to line • When the whole packet is moved to FIFO, the OWN bit is set to 1 • When the whole packet is moved to line, the TOK(TSD) is set to 1 • If TOK(IMR) is set, then TOK(ISR) is set and a interrupt is triggered • Interrupt service routine called, driver should clear TOK(ISR)
Packet receptionRing buffer • Data goes to RxFIFO • coming from line • Move to the buffer • when early receive threshold is meet. • Ring buffer • physical continuous • CBR (0x3A~3B R) • the Current address of data moved to Buffer • CAPR (0x38~39 R/W) • the pointer keeps Current Address of Pkt having been read • Status of receiving a packet • stored in front of the packet (packet header)
Packet receptionThe Packet Header (32 bits, i.e. 4 bytes) • Bit 31~16: rx_size, including 4 bytes CRC in the tail • pkt_size = rx_size - 4
Packet receptionProcess of packet receive in detail • Data received from line is stored in the receive FIFO • When Early Receive Threshold is meet, data is moved from FIFO to Receive Buffer • After the whole packet is moved from FIFO to Receive Buffer, the receive packet header (receive status and packet length) is written in front of the packet. • CBA is updated to the end of the packet. 4 byte alignment • CMD (BufferEmpty) is clear and ISR(TOK) is set. • ISR routine called and then driver clear ISR(TOK) and update CAPR • cur_rx = (cur_rx + rx_size + 4 + 3) & ~3; • NETDRV_W16_F (RxBufPtr, cur_rx - 16); Packet header Avoid overflow
Outline • Driver framework • Linux network drivers • Device operation • RTL8139 programming • Driver example • A piece of code for 93C46 series • EEPROM, 93C46 64 x 16 bits, 93C66 256 x 16 bits • pci_skeleton.c (for RTL8139)
EEPROM 93C46 operations 93C46 Command Register (0x50 R/W)
/* Shift the read command bits out. */ for (i = 4 + addr_len; i >= 0; i--) { int dataval = (read_cmd & (1 << i)) ? EE_DATA_WRITE : 0; writeb (EE_ENB | dataval, ee_addr); eeprom_delay (); writeb (EE_ENB | dataval | EE_SHIFT_CLK, ee_addr); eeprom_delay (); } writeb (EE_ENB, ee_addr); eeprom_delay (); for (i = 16; i > 0; i--) { writeb (EE_ENB | EE_SHIFT_CLK, ee_addr); eeprom_delay (); retval = (retval << 1) | ((readb (ee_addr) & EE_DATA_READ) ? 1 : 0); writeb (EE_ENB, ee_addr); eeprom_delay (); } /* Terminate the EEPROM access. */ writeb (~EE_CS, ee_addr); eeprom_delay (); return retval; } #define EE_SHIFT_CLK 0x04 /* EEPROM shift clock. */ #define EE_CS 0x08 /* EEPROM chip select. */ #define EE_DATA_WRITE 0x02 /* EEPROM chip data in. */ #define EE_DATA_READ 0x01 /* EEPROM chip data out. */ #define EE_ENB (0x80 | EE_CS) #define eeprom_delay() readl(ee_addr) /* EEPROM commands include the alway-set leading bit */ #define EE_WRITE_CMD (5) #define EE_READ_CMD (6) #define EE_ERASE_CMD (7) A piece code for EEPROM 93C46 addr_len = read_eeprom (ioaddr, 0, 8) == 0x8129 ? 8 : 6; for (i = 0; i < 3; i++) ((u16 *) (dev->dev_addr))[i] = le16_to_cpu (read_eeprom (ioaddr, i + 7, addr_len)); static int __devinit read_eeprom ( void *ioaddr, int location, int addr_len) { int i; unsigned retval = 0; void *ee_addr = ioaddr + Cfg9346; int read_cmd = location | (EE_READ_CMD << addr_len); writeb (EE_ENB & ~EE_CS, ee_addr); writeb (EE_ENB, ee_addr); eeprom_delay ();
Outline • Driver framework • Linux network drivers • Device operation • RTL8139 programming • Driver example • A piece of code for 93C46 series • EEPROM, 93C46 64 x 16 bits, 93C66 256 x 16 bits • pci_skeleton.c (for RTL8139)
#include<> of the RTL8139 barrier() printk() byteorder.h module_init() module_exit() Operating System spinlock.h config.h MOD_* MODULE_*() Definitions of I/O port read/write and ioremap() module.h init.h kernel.h delay.h PCI BUS udelay() definition asm/io.h crc32.h pci-skeleton.c 給multicast算ether_crc() pci.h 被間接引入 sched.h (irq,jiffies,capable) slab.h time.h spinlock.h asm/atomic.h skbuff.h PCI defines and prototypes pci_alloc_consistent() pci_resource_*() pci_request_regions() pci_set_master() pci_read_config_word()(err) Network Device mii.h etherdevice.h netdevice.h Definitions for struct net_device register_netdev() netif_*() skbuff.h Definitions for Ethernet eth_type_trans() alloc_ethdev() Definitions for MII_ADVERTISE, MII_LPA ADVERTISE_FULL, LPA_100FULL…
Driver structure of the RTL8139 • pci_module_init() / pci_unregister_driver() • static struct pci_driver netdrv_pci_driver = { • name: "netdrv", • id_table: netdrv_pci_tbl, • probe: netdrv_init_one, • remove: netdrv_remove_one, • #ifdef CONFIG_PM • suspend: netdrv_suspend, • resume: netdrv_resume, • static struct pci_device_id netdrv_pci_tbl[] __devinitdata = { • {0x10ec, 0x8139, PCI_ANY_ID, PCI_ANY_ID, 0, 0, RTL8139 }, • MODULE_DEVICE_TABLE (pci, netdrv_pci_tbl); driver_data (Private, Sq# here) pci_device_id
struct pci_dev *pdev, struct pci_device_id *ent netdrv_init_one() call netdrv_init_board() to get net_device dev, void *ioaddr netdrv_init_board() dev = alloc_etherdev(sizeof()) pci_enable_device (pdev); pci_request_regions (pdev, “pci-sk"); pci_set_master (pdev); Initial net_device dev Set up dev_addr[], irq, base_addr Set up method: dev->open, dev->hard_start_transmit, dev->stop, dev->get_stats, dev->set_multicast_list, dev->do_ioctl, dev->tx_timeout mmio_start = pci_resource_start (pdev, 1); ioaddr = ioremap (mmio_start, len); Soft reset the chip. NETDRV_W8 (ChipCmd, (NETDRV_R8 (ChipCmd) & ChipCmdClear) | CmdReset); identify chip attached to board register_netdev (dev); // ethX PCI device probe functionnetdrv_init_one() Linux invoke when probing 登記I/O port and memory
NETDRV_W?() • /* write MMIO register, with flush */ • /* Flush avoids rtl8139 bug w/ posted MMIO writes */ • #define NETDRV_W8_F(reg, val8) do { writeb ((val8), ioaddr + (reg)); readb (ioaddr + (reg)); } while (0) • #define NETDRV_W16_F(reg, val16) do { writew ((val16), ioaddr + (reg)); readw (ioaddr + (reg)); } while (0) • #define NETDRV_W32_F(reg, val32) do { writel ((val32), ioaddr + (reg)); readl (ioaddr + (reg)); } while (0) • #define NETDRV_W8 NETDRV_W8_F • #define NETDRV_W16 NETDRV_W16_F • #define NETDRV_W32 NETDRV_W32_F • #define NETDRV_R8(reg) readb (ioaddr + (reg)) • #define NETDRV_R16(reg) readw (ioaddr + (reg)) • #define NETDRV_R32(reg) ((unsigned long) readl (ioaddr + (reg)))
Device methods • dev->openintnetdrv_open (struct net_device *dev); • dev->hard_start_transmitint netdrv_start_xmit (struct sk_buff *skb, struct net_device *dev); • dev->stopint netdrv_close (…); • dev->get_statsstruct net_device_stats *netdrv_get_stats (struct net_device *); • dev->set_multicast_listvoid netdrv_set_rx_mode (…); • dev->do_ioctlint netdrv_ioctl (struct net_device *dev, struct ifreq *rq, int cmd); • dev->tx_timeoutvoid netdrv_tx_timeout (struct net_device *dev);
netdrv_hw_start (dev) Soft reset the chip /* Restore our idea of the MAC address. */ NETDRV_W32_F (MAC0 + 0, cpu_to_le32 (*(u32 *) (dev->dev_addr + 0))); NETDRV_W32_F (MAC0 + 4, cpu_to_le32 (*(u32 *) (dev->dev_addr + 4))); NETDRV_W8_F (ChipCmd, (NETDRV_R8 (ChipCmd) & ChipCmdClear) | CmdRxEnb | CmdTxEnb); Setting RxConfig and TxConfig NETDRV_W32_F (RxBuf, tp->rx_ring_dma); init Tx buffer DMA addresses netdrv_set_rx_mode (dev); NETDRV_W16_F (IntrMask, netdrv_intr_mask); netif_start_queue (dev); Up up……netdrv_open() netdrv_open() request_irq (dev->irq, netdrv_interrupt, SA_SHIRQ, dev->name, dev) tx_bufs = pci_alloc_consistent(pdev, TXBUFLEN, &tx_bufs_dma); rx_ring = pci_alloc_consistent(pdev, RXBUFLEN, &rx_ring_dma); netdrv_init_ring (dev); netdrv_hw_start (dev); Set the timer to check for link beat
mclist[0].dmi_addr mclist[1].dmi_addr 31 30 28 26 29 27 25...0 mclist[2].dmi_addr Setup receive mode and multicast hashtable(*set_multicast_list)()netdrv_set_rx_mode() • if (flags & IFF_PROMISC) • AcceptBroadcast | AcceptMulticast | AcceptMyPhys | AcceptAllPhy • mc_filter[1] = mc_filter[0] = 0xffffffff • else if ((mc_count > multicast_filter_limit) || (flags & IFF_ALLMULTI)) • AcceptBroadcast | AcceptMulticast | AcceptMyPhys • mc_filter[1] = mc_filter[0] = 0xffffffff • else • AcceptBroadcast | AcceptMulticast | AcceptMyPhys ether_crc() 63 62 1 0
0 1 2 3 0 1 2 dirty_tx cur_tx Transmit a packetnetdrv_start_xmit() netdrv_start_xmit() if (skb->len < ETH_ZLEN) skb = skb_padto(skb, ETH_ZLEN); entry = atomic_read (&cur_tx) % NUM_TX_DESC; tx_info[entry].skb = skb; memcpy (tx_buf[entry], skb->data, skb->len); NETDRV_W32 (TxStatus[entry], tx_flag | skb->len); dev->trans_start = jiffies; atomic_inc (&cur_tx); if ((atomic_read (&cur_tx) - atomic_read (&dirty_tx)) >= NUM_TX_DESC) netif_stop_queue (dev);
Interrupt handlingnetdrv_interrupt() • spin_lock (&tp->lock); • status = NETDRV_R16 (IntrStatus); • NETDRV_W16_F (IntrStatus, status); // Acknowledge • Spec says, “The ISR bits are always set to 1 if the condition is present. ” • Spec says, “Reading the ISR clears all. Writing to the ISR has no effect.” • if (status & (PCIErr | PCSTimeout | RxUnderrun | RxOverflow |RxFIFOOver | TxErr | RxErr)) • netdrv_weird_interrupt (dev, tp, ioaddr, status, link_changed); • if (RxOK | RxUnderrun | RxOverflow | RxFIFOOver) • netdrv_rx_interrupt (dev, tp, ioaddr); • if (status & (TxOK | TxErr)) • netdrv_tx_interrupt (dev, tp, ioaddr); • spin_unlock (&tp->lock); ISR 0 1 1 1 0 0 1 1 IMR 0 0 1 1 0 0 1 0 Interrupt
Interrupt handlingnetdrv_tx_interrupt(dev, tp, ioaddr) • dirty_tx = atomic_read (&tp->dirty_tx); • cur_tx = atomic_read (&tp->cur_tx); • tx_left = cur_tx - dirty_tx; • while (tx_left > 0) { • int entry = dirty_tx % NUM_TX_DESC; • int txstatus = NETDRV_R32 (TxStatus[entry]); • if (!(txstatus & (TxStatOK | TxUnderrun | TxAborted))) break; /* It still hasn't been Txed */ • if (txstatus & (TxOutOfWindow | TxAborted)) { /* There was an major error, log it. */ • tp->stats.tx_errors++; • } else { • if (txstatus & TxUnderrun) /* Add 64 to the Tx FIFO threshold. */ • tp->tx_flag += 0x00020000; • tp->stats.tx_bytes += txstatus & 0x7ff; • tp->stats.tx_packets++; • } • dev_kfree_skb_irq (tp->tx_info[entry].skb); • tp->tx_info[entry].skb = NULL; • dirty_tx++; • if (netif_queue_stopped (dev)) • netif_wake_queue (dev); • cur_tx = atomic_read (&tp->cur_tx); • tx_left = cur_tx - dirty_tx; • } • atomic_set (&tp->dirty_tx, dirty_tx);
Interrupt handling Packet receptionnetdrv_rx_interrupt (dev,tp, ioaddr) • rx_ring = tp->rx_ring; • cur_rx = tp->cur_rx; • while ((NETDRV_R8 (ChipCmd) & RxBufEmpty) == 0) { • ring_offset = cur_rx % RX_BUF_LEN; • rx_status = le32_to_cpu (*(u32 *) (rx_ring + ring_offset)); • rx_size = rx_status >> 16; • pkt_size = rx_size - 4; • skb = dev_alloc_skb (pkt_size + 2); • skb->dev = dev; • skb_reserve (skb, 2); /* 16 byte align the IP fields. */ • eth_copy_and_sum (skb, &rx_ring[ring_offset + 4], pkt_size, 0); • skb_put (skb, pkt_size); • skb->protocol = eth_type_trans (skb, dev); • netif_rx (skb); • dev->last_rx = jiffies; • tp->stats.rx_bytes += pkt_size; • tp->stats.rx_packets++; • cur_rx = (cur_rx + rx_size + 4 + 3) & ~3; • NETDRV_W16_F (RxBufPtr, cur_rx - 16); • } • tp->cur_rx = cur_rx; Status packet CRC