1 / 15

Code Review for IPv4 Metarouter Header Format

Code Review for IPv4 Metarouter Header Format. Jing Lu jl1@arl.wustl.edu. QM. Header Format. Header Format. Lookup. Tx. Rx. Substr Decap. Parse. Main functions: Put on MN Internal header (slow path), tunnel frame header (IP/UDP header) and Ethernet VLAN header based on:

mcculloughl
Download Presentation

Code Review for IPv4 Metarouter Header Format

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Code Review for IPv4 MetarouterHeader Format Jing Lu jl1@arl.wustl.edu

  2. QM Header Format Header Format Lookup Tx Rx Substr Decap Parse • Main functions: • Put on MN Internal header (slow path), tunnel frame header (IP/UDP header) and Ethernet VLAN header based on: • Exception flags raised by Parse block • TTL expired: bit 0 of exception flags • IP option: bit 1 of exception flags • Lookup result • Hit, Drop, Local delivery bits • If Rx UDP DPort = Tx UDP SPort, packet should be redirected • Increment pre-queue packet counter and byte counter for each incoming packet based on counter index • Update buffer descriptor with new buffer/packet size, buffer offset and counter index • pass relevant fields to QM • NN communication • Single thread

  3. Where is the code • Dispatch loop: • IPv4_MR\src\dispatch_loop\PL\hdr_format_dl.[c,h] • IPv4_MR\src\dispatch_loop\PL\dl_source.[c,h] • IPv4_MR\src\dispatch_loop\PL\nn_rings.[c,h] • Header format: • IPv4_MR\src\hdr_format\PL\hdr_format.[c,h] • Ipv4 header format: • IPv4_MR\src\ipv4\PL\ipv4_hdr_format.[c,h] • External Dependencies: • Ring Data format: • IPv4_MR/src/dispatch_loop/PL/ring_formats.h • System definitions and memory locations: • IPv4_MR/build/PL/dispatch_loop/dl_system.h

  4. Required Includes • Files • IXA_SDK_4.0\microengineC\src\intrinsic.c • IXA_SDK_4.0\microengineC\src\rtl.c • Directories • IXA_SDK_4.0\src\library\microblocks_library\microc\ • IXA_SDK_4.0\MicroengineC\include\..\..\..\..\ • IXA_SDK_4.0\src\library\dataplane_library\microc\ • These are required to gain access to the buffer libraries and intrinsic functions!

  5. QM Input and Output Buf Handle(32b) Rsv_1 (4b) QID(20b) Port (4b) Rsv_2 (4b) MN Fram Length (16b) Cntr Index (16b) Lookup Hdr Format Buf Handle(32b) IP Pkt Length (16b) IP Pkt Offset (16b) Slice ID (VLAN) (16b) Rx UDP DPort(16b) R S V d (1b) H (1b) L D (1b) D (1b) Exception Bits (12b) Cntr Index (16b) H: Hit D: Drop LD: Local Delivery Exception[0]: TTL Exception[1]: IP Option Tx IP DAddr (32b) Tx UDP DPort (16b) Tx UDP SPort(16b) DA(8b) Port (4b) QID(20b) Slice data pointer (32b) Rsv2(12b) Code opt (4b) Rx UDP SPort (16b) Rx IP SAddr (32b)

  6. Initialization typedef struct _hdr_format_control_block { unsigned int eth_addr_hi32; unsigned int eth_addr_lo16; unsigned int this_ip_addr; } hdr_format_control_block; typedef struct _hdr_format_slice_info_table { unsigned int gpe_ip_addr; unsigned int npe_ip_addr; unsigned int gpe_eth_addr_hi32; unsigned int gpe_eth_addr_lo16; unsigned int udp_src_port; unsigned int udp_dst_port; unsigned int port; unsigned int ld_qid; unsigned int excpt_qid; } hdr_format_slice_info_table; • Static configuration by XScale • Control block (12B) • Ethernet address • IP address (global IP) • Slice info table per slice (36B) • GPE IP address (local IP) • NPE IP address (local IP) • GPE Ethernet address • UDP SRC port • UDP DST port • Port • QID for local delivery • QID for exception packets

  7. Global Variables • Externally defined global variables: • In hdr_format_dl.c • ring_in • ring_out • dlNextBlock • Initialization variables shared by all threads: • In hdr_format.c • this_ip_addr • eth_addr_hi32 • eth_addr_lo16 • partial_ip_cksum (computed on known IP header fields) • header_format_init() will read the control block in SRAM and initialize these variables

  8. Header Data Structure DstAddr (6B) Ethernet VLAN Header (18B) SrcAddr (6B) Type=802.1Q (2B) VLAN (2B) Type=IP (2B) Ver/HLen/Tos (2B) Len (2B) ID/Flags/FragOff set(4B) TTL (1B) IP Header (20B) Header Protocol = UDP (1B) Hdr Cksum (2B) Dst Addr (4B) Src Addr (4B) Src Port (2B) UDP Header (8B) Dst Port (2B) UDP length (2B) UDP checksum (2B) Same for all pkts Rsvd, Type, (4B) MN Internal Header (8,16B) hdr_length (2B) Vary per pkt Rx UDP DPort (2B) Rx IP SAddr (4B) Rx UDP SPort (2B) Type dependent data (8B)

  9. Function and Performance Memory access: Functions: Processing cycles: Common case/worst case Dequeue ring_in data NN: 9W reads 42/42 Construct MN int hdr 44/86 Construct IP, UDP, Ethernet, VLAN hdr 64/73 12/12 Set IP checksum 11/11 Set UDP checksum DRAM: 46-58B writes 37/40 Write hdr to DRAM Inc Pre_queue Cnt SRAM: 8B writes 15/15 Update buffer descriptor SRAM: 10B writes 66/66 Enqueue ring_out data NN: 3W writes 27/27 318/372

  10. Performance • 372 cycles for CPU processing • ~1300 cycles latency • Expected performance (90B min IPv4 packet (78 min IPv4MN + 12B IFS)) • (201/372)*5Gbps = 2.7Gbps • To achieve 5Gbps, need two MEs running in parallel

  11. IPv4 Internal Header Format Type (28b) 0000 Length (2B) Rx UDP DPort (2B) Tx UDP DPort (2B) Rx IP Saddr (4B) Tx IP DAddr (4B) Rx UDP SPort (2B) Type Dependent Data (8B) Tx UDP SPort (2B) FwdKey = [Tx UDP DPort + Tx UDP Sport + Tx IP DAddr]

  12. Construct ipv4 MN Internal header Yes Drop bit set? No Yes Hit bit set? No No No No No TTL expired? Set NR bit in type Local DL? IP option? Redirect? Yes Yes Yes Yes No Set TTL bit in type; Set Rx UDP DPort; Length = 4 Set LD bit in type; Set Rx UDP DPort; Length = 4 Set OPT bit in type; Set Rx UDP DPort; Set TypeDependData; Length = 12 Set RD bit in type; Set Rx UDP DPort; Set TypeDependData; Length = 12 TTL expired? Yes Set TTL bit in type; Set Rx UDP DPort Length = 4 86 cycles for the worst case 44 cycles for the common case return

  13. Testing MR Header Format Stub Parse Dummy Lookup Hdr Format Buf Handle(32b) Buf Handle(32b) H: Hit D: Drop LD: Local Delivery Exception[0]: TTL Exception[1]: IP Option IP Pkt Length (16b) IP Pkt Offset (16b) IP Pkt Length (16b) IP Pkt Offset (16b) Slice ID (VLAN) (16b) Rx UDP DPort(16b) Lookup Key[143-112] Slice ID/Rx UDP DPort (32b) R S V d (1b) H (1b) L D (1b) D (1b) Exception Bits (12b) Cntr Index (16b) Lookup Key[111-80] DA (32b) Lookup Key[ 79-48] SA (32b) Tx IP DAddr (32b) Lookup Key[ 47-16] Ports (32b) Tx UDP DPort (16b) Tx UDP SPort(16b) L Flags (4b) Exception Bits (12b) Lookup Key Proto/TCP_Flags [15- 0] (16b) DA(8b) Port (4b) QID(20b) Slice data pointer (32b) Slice Data Ptr (32b) Rsv2(12b) Code opt (4b) Rx UDP SPort (16b) Rsv2(12b) Code opt (4b) Rx UDP SPort (16b) Rx IP SAddr (32b) Rx IP SAddr (32b) • Dummy Lookup block enumerates all combinations of the five bits and generates corresponding NN ring data to Hdr Format.

  14. Possible Optimizations Memory access: Processing cycles: Common case/worst case Optimizations: Functions: • More efficient Dequeue NN: 9W reads 42/42 -10 Dequeue ring_in data • Reduce redundant assignments for worst case 44/86 -15 Construct MN int hdr • Static fields only initialized by the first packet in each thread 64/73 -20 Construct IP, UDP, Ethernet, VLAN hdr 12/12 Set IP checksum 11/11 Set UDP checksum DRAM: 46-58B writes 37/40 Write hdr to DRAM • Aligned sram writes, use assembler SRAM: 8B writes 15/15 -6 Inc Pre_queue Cnt • Similar to DRAM writes SRAM: 10B writes 66/66 -30 Update buffer descriptor NN: 3W writes 27/27 Enqueue ring_out data 318/372 -81

  15. Implementation Status • Add dynamic statistics • Packet counter for fast path packets • Packet counter for exception path packets • Packet counter per exception case • Decide which field in buffer descriptor to store counter index • Run 8-thread simulation

More Related