1 / 35

Introduction

Introduction. Linus Svensson D5, linus@sm.luth.se Åke Östmark D5, ake@sm.luth.se. Why We Are Here. The architecture of a Network Processor Unit (NPU) Master’s thesis - a joint operation between Luleå University of Technology and SwitchCore AB. Today's Topics. Background

lowri
Download Presentation

Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction • Linus Svensson • D5, linus@sm.luth.se • Åke Östmark • D5, ake@sm.luth.se

  2. Why We Are Here • The architecture of a Network Processor Unit (NPU) • Master’s thesis - a joint operation between Luleå University of Technology and SwitchCore AB

  3. Today's Topics • Background • Ethernet and internetworks • Switches and routers • NPU (Network Processor Unit) • Why an NPU? • Cons and pros with NPU:s • The architecture of our NPU • Design difficulties and design choices • The architecture, strengths and weaknesses • The big picture • From idea to silicon

  4. Ethernet • Most widespread network technology used in LAN (Local Area Network) • 10 Mb/s (Ethernet) • 100 Mb/s (Fast Ethernet) • 1000 Mb/s (Gigabit Ethernet) • Packet switched network • Host-to-host delivery on the same network • Switches forward packets from one section to another using the datagram paradigm

  5. Ethernet • Datagram paradigm • Packet contains enough information for a switch to forward it correctly • I.e. packet contains complete destination address • Ethernet packets = frames • In Ethernet the packets are referred to as frames

  6. Ethernet Frame Format • Preamble • 64 bits used for synchronisation • Header • 48-bit globally unique destination address • 48-bit globally unique source address • 16-bit type field used for classification

  7. Ethernet Frame Format • Body • 46-1500 bytes of data • CRC • 32-bit CRC (Cyclic Redundancy Check) for error detection

  8. Internetworks • Internetwork • Several physical networks combined into one logical internetwork • Also called internet (with lowercase “i”) • Most famous is the world spanning Internet (with capital “I”) • Host-to-host delivery between different networks

  9. Internet Protocol (IP) • Most widespread protocol used in internetworks • Routers forward packets from one network to another using the datagram paradigm

  10. IP Packet Format • 12 bytes of status fields e.g. version, length etc • 32-bit globally unique source address • 32-bit globally unique destination address • Optional fields of variable length • Body

  11. IP Over Ethernet • IP packets are encapsulated in Ethernet frames

  12. Host-To-Host Communication

  13. Devices • SwitchCore CXE-2010 • A 16-port Gigabit Ethernet Switch-on-a-chip • Full 4K VLAN support • Includes support of IEEE 802.1p • Cisco 1710 • Security Access Router • Secure Internet, intranet, and extranet access with VPN and firewall • Advanced QoS features

  14. Features • What if we want: • Load Balancing • distributing client requests across multiple servers • Multi-Protocol Label Switching (MPLS) • next hop based on a the label

  15. Features • What if we don’t want • QoS • Security features • The Network Processor Unit (NPU) • A programmable CPU chip that is optimized for networking and communications functions • Quick adaptation of new standards/features

  16. Conditions For the Work • 1 GE (1000 Mbit) port • 8 FE (100 Mbit) ports • Scalable • Add more ports • Remove ports • Feasible to make an ASIC prototype

  17. NPU components: • Processor Core • Embedded software • Network Interface • Packet buffers • Queues • Tables • Switch fabric

  18. Design Choices • Processor core • RISC based • Network specific • Network Interface • FE • MII (Media Independent Interface) • RMII (Reduced MII) • GE • GMII (Gigabit MII) • RGMII (Reduced GMII)

  19. Design Choices • Queues • A packet ready for transmission • Tables • Data structure for IP & MAC addresses • Switch fabric • The internal interconnect architecture. How to transport from in-port to out-port?

  20. Design Choices • Packet buffers • Internal and/or external • How many times do we need to access a (buffer) memory? • Write when receive from network • Read packet for processing • Write modified packet for transmission • Reading the packet when transmitting •  For N ports the memory needs to run at 4N the port speed

  21. Design Choices • 8 FE ports • 1 GE port • Inter-arrival time: • 1.5*106 + 8*1.55 = 2.7*106 packets/s • -> New packet every 370 ns • Cycle budget example: • 100 MHz -> 37 cycles to process every packet • 200 MHz -> 74 cycles to process every packet

  22. Design Choices • Model of operation • Route processing • Packet forwarding~200 cycles • Special services • Target technology • ~150 MHz

  23. Design Decisions Parallel Processor Architecture • 2 FE ports • 125 MHz • 1 Integer Unit • 1 GE port • 125 MHz • 5 Integer Units • -> Cycle budget of 420 for each packet • Interactive voice can tolerate somewhere between 100 and 200 milliseconds of end-to-end delay without people noticing it. • 420 cycles -> 0.00336 ms

  24. Design Decisions • Tables • MAC Address lookup, fixed length: • CAM (Content Addressable Memory) • Pros: Fast • Cons: Expensive • Like a cache • IP Address lookup, longest match: • Possibly large table • External SRAM

  25. Internal packet buffers: • Pros: Fast, less pin count • Cons: Limited size of memory • 2 FE ports / 1 buffer • Pros: Reduce contention, reduce 4N problem • Cons: Less effective use of memory

  26. Virtual output queues: • Pros: No Head Of Line (HOL) blocking, Possible to select any packet from buffer memory • Cons: Expensive in hardware

  27. NPU Architecture

  28. Performance

  29. Strengths in the Architecture • More bandwidth • More RU and TU • New types of RU and TU • More processing power • More PU per RU/TU • More IU per PU • New types of PU • New types of IU

  30. Strengths in the Architecture • New functionality • New types of shared resources • Semaphores • Multipurpose CPU • New software • All IU:s can run different software

  31. Weaknesses in the Architecture • Not everything scales well • Shared resources • No. of IU:s in a PU

  32. From Idea to Silicon • ASIC design flow

  33. Layout ALU : process(alu_RegA, alu_RegB, In_Ctrl_Ex) begin case In_Ctrl_Ex.OP is when ALU_ADD => alu_Result <= alu_RegA + alu_RegB; when ALU_SUB => alu_Result <= alu_RegA - alu_RegB; when ALU_AND => alu_Result <= alu_RegA and alu_RegB; when ALU_OR => alu_Result <= alu_RegA or alu_RegB; when ALU_XOR => alu_Result <= alu_RegA xor alu_RegB; when ALU_NOR => alu_Result <= alu_RegA nor alu_RegB; when others => alu_Result <= (others => '-'); end case; end process;

More Related